A Simple Archive Format for Self-Contained Executables
The build/run instructions for the example games for high_impact were subtly wrong:
make sokol
./build/game_sokol
make sokol
compiles the Sokol version, converts all assets and puts the results
(executable and converted assets) into the build/
directory. So far so good.
Where it falls apart is in the next line: ./build/game_sokol
. The executable
starts just fine, but it's looking in the current directory (./
) for all the
assets, can't find them and terminates.
The obvious fix is to change the build instructions: change to the build/
directory first:
(cd build && ./game_sokol)
A better solution would be to make the executable agnostic of the current path: We could figure out the path of the executable and use that as the base path when loading assets.
The SDL2 version already did exactly this through the handy SDL_GetBasePath()
function. Sokol doesn't provide anything like that, so I had to roll my own.
Sadly, there's no cross platform (i.e. posix) way to find the executable path,
so the implementation differs for Linux, macOS and Windows. It's not too
complicated, but still a burden to maintain.
An even better solution would be to circumvent this problem entirely and embed all the assets directly into the executable. Sherief Farouk makes a good case for why we might want to do this.
The idea sounds quite simple, but it's surprisingly convoluted to implement across platforms.
Windows has “resource scripts”, .rc
files that the compiler understands. macOS
has the convention of “bundles” which are just directories with an .app
extension, missing the goal of this exercise.
We could instruct the linker to embed files into the .data
section, or use
C2x #embed
directly in the source. But embedding files like this comes with an
impact on performance: typically the whole executable is loaded into memory at
program start. Not ideal if you have a few gigabytes of assets.
How about creating an archive (e.g. ZIP, or TAR) that we just slap on to the end of the executable after compiling/linking?
cat game assets.zip > game_with_assets
argv[0]
holds the path of the executable, so we could just fopen(argv[0])
,
somehow find this ZIP archive at the end of the executable and unpack our
assets. This should work!
Update September 3rd 2024: It's not entirely correct that arvg[0]
holds the
path to the executable. This is (for example) not the case if the executable is
in the PATH
. So we still need some platform dependend boilerplate.
For high_impact all game assets are already in a compressed format (QOI or QOA), so we don't need the extra complexity of ZIP compression (though granted, entropy encoding QOI often has some gains).
TAR seems
quite obtuse these days. For starters: the size of a file is stored as a 12 byte
null terminated string using ASCII chars 0-7
to represent an octal number (why?).
I'm sure the division into fixed size blocks and much of the other idiosyncrasies
of TAR made sense for some applications in 1979, but I don't want to deal with it.
The PhysicsFS library looks nice, but is way to complicated for what I want to do. I don't need support for Doom WADs and a dozen other archive formats.
So instead of looking for other formats or libraries, I did what I always do…
QOP – The Quite OK Package Format
tl;dr: QOP is a super simple archive format. Single header, MIT licensed, source on github: https://github.com/phoboslab/qop
QOP archives consist of three parts:
- a concatenation of all files
- an index to find a file by its path
- a “header” (footer?) that sits at the end of the archive so it can be easily located when the archive is concatenated to an executable
The index doesn't need to store the path. A fixed size hash of the path is all
we need. qop_find(q, "images/title.qoi")
hashes the supplied path and looks
up that hash in the index.
The whole QOP file format is best described as a C-ish struct:
struct {
// Path string and data of all files in this archive
struct {
uint8_t path[path_len];
uint8_t bytes[size];
} file_data[];
// The index, with a list of files
struct {
uint64_t hash;
uint32_t offset;
uint32_t size;
uint16_t path_len;
uint16_t flags;
} qop_file[];
// The number of files in the index
uint32_t index_len;
// The size of the whole archive, including the header
uint32_t archive_size;
// Magic bytes "qopf"
uint32_t magic;
} qop;
The path of each file is still stored (to enable unpacking of the archive), but it sits in front the data of each file. This layout makes it nice to look at in a Hex-editor and we still have a fixed size for each index element.
Having the header at the end of the file not only makes it easy to find an
archive that is concatenated to an executable, but also allows you to append
files to the archive without needing to rewrite it completely: just cut the
existing header, append your files, paste the header to the end, add the new
files to the index and adjust index_len
and archive_size
.
Since the hash of each path is stored in the index, the obvious way to look up a file is to build a hash table. Which is exactly what qop.h does. I'm just using linear probing here. With a table size of at least 1.5x of the stored elements this is plenty fast and as simple as it gets.
Since I didn't want the QOP library to allocate any memory on its own, opening a QOP archive is a three-step process:
- find and read the fixed size header of the archive
- allocate memory for the index
- read the index
In practice, opening and reading a file from a QOP archive looks like this:
// Open the archive
qop_desc qop;
qop_open("archive.qop", &qop);
// Read the index into supplied memory
qop_read_index(&qop, malloc(qop.hashmap_size));
// Find a file
qop_file *file = qop_find(&qop, "qop.h");
assert(file);
// Load the file contents
unsigned char *contents = malloc(file->size);
qop_read(&qop, file, contents);
Have a look at the example.c for a more verbose version with error checking. All library functions are documented in qop.h.
The QOP file format is not finalized yet. One thing I should probably change is
the uint32_t
limit for files sizes. If you have any other thoughts on the
format, please let me know on github.
Single File Game Releases
high_impact has the notion of a “platform” – a bunch of functions that interact
with the OS & hardware. Currently two platforms are implemented: SDL2 & Sokol.
Ultimately everything in high_impact goes through one of these. Loading
files (images, sounds, …) is no exception: they all go through
platform_load_asset()
.
With this in place, implementing QOP is quite straight forward. Previously
platform_load_asset()
looked like this:
uint8_t *platform_load_asset(const char *name, uint32_t *bytes_read) {
char *path = strcat(strcpy(temp_path, path_assets), name);
return file_load(path, bytes_read);
}
Now, to load a file from a QOP archive (that has already been opened), with a fallback to the filesystem:
uint8_t *platform_load_asset(const char *name, uint32_t *bytes_read) {
// Try to load from the QOP archive first
if (qop.index_len) {
qop_file *f = qop_find(&qop, name);
if (f) {
uint8_t *data = temp_alloc(f->size);
*bytes_read = qop_read(&qop, f, data);
return data;
}
}
char *path = strcat(strcpy(temp_path, path_assets), name);
return file_load(path, bytes_read);
}
That, and a simple qop_open(argv[0], &qop)
at program start is enough to make
high_impact games load all assets from a QOP archive appended to the executable.
The Makefile for Biolab Disaster &
Drop contains targets for
sdl_release
and sokol_release
to compile the respective version, create a
QOP archive and concatenate this archive to the executable. This executable is
now self-contained – it's the only file you'd need to share when you want to
distribute your game.
(Well, in the case of the Sokol build anyway. The SDL2 build still requires the
SDL2.dll
to be present; we would need to statically link it instead.)
Of course you don't need to do this to distribute your game. You could also load
all assets from a separate data.qop
or just use plain files as before. But the
fact that you can build a self-contained executable is neat!
A Word of Caution
As Ashley Gullen remarked on twitter, appending data to the end of an executable used to trip up antiviruses. I don't know if that's still the case.
If you're one of the unfortunate souls that still have to use Windows for one reason or another and you find your antivirus complaining, please tell the antivirus vendor to fix their shit!
I'm only half joking here. Antivirus vendors have such a poor track record that I can't muster any sympathy for the vendor and only a little more for the user. This is something that ought to work and I refuse to not do it because of some overreaching companies.