Dominic Szablewski, @phoboslab
— Monday, September 2nd 2024

A Simple Archive Format for Self-Contained Executables

The build/run instructions for the example games for high_impact were subtly wrong:

make sokol
./build/game_sokol

make sokol compiles the Sokol version, converts all assets and puts the results (executable and converted assets) into the build/ directory. So far so good. Where it falls apart is in the next line: ./build/game_sokol. The executable starts just fine, but it's looking in the current directory (./) for all the assets, can't find them and terminates.

The obvious fix is to change the build instructions: change to the build/ directory first:

(cd build && ./game_sokol)

A better solution would be to make the executable agnostic of the current path: We could figure out the path of the executable and use that as the base path when loading assets.

The SDL2 version already did exactly this through the handy SDL_GetBasePath() function. Sokol doesn't provide anything like that, so I had to roll my own. Sadly, there's no cross platform (i.e. posix) way to find the executable path, so the implementation differs for Linux, macOS and Windows. It's not too complicated, but still a burden to maintain.

An even better solution would be to circumvent this problem entirely and embed all the assets directly into the executable. Sherief Farouk makes a good case for why we might want to do this.

The idea sounds quite simple, but it's surprisingly convoluted to implement across platforms.

Windows has “resource scripts”, .rc files that the compiler understands. macOS has the convention of “bundles” which are just directories with an .app extension, missing the goal of this exercise.

We could instruct the linker to embed files into the .data section, or use C2x #embed directly in the source. But embedding files like this comes with an impact on performance: typically the whole executable is loaded into memory at program start. Not ideal if you have a few gigabytes of assets.

How about creating an archive (e.g. ZIP, or TAR) that we just slap on to the end of the executable after compiling/linking?

cat game assets.zip > game_with_assets

argv[0] holds the path of the executable, so we could just fopen(argv[0]), somehow find this ZIP archive at the end of the executable and unpack our assets. This should work!

Update September 3rd 2024: It's not entirely correct that arvg[0] holds the path to the executable. This is (for example) not the case if the executable is in the PATH. So we still need some platform dependend boilerplate.

For high_impact all game assets are already in a compressed format (QOI or QOA), so we don't need the extra complexity of ZIP compression (though granted, entropy encoding QOI often has some gains).

TAR seems quite obtuse these days. For starters: the size of a file is stored as a 12 byte null terminated string using ASCII chars 0-7 to represent an octal number (why?). I'm sure the division into fixed size blocks and much of the other idiosyncrasies of TAR made sense for some applications in 1979, but I don't want to deal with it.

The PhysicsFS library looks nice, but is way to complicated for what I want to do. I don't need support for Doom WADs and a dozen other archive formats.

So instead of looking for other formats or libraries, I did what I always do…

QOP – The Quite OK Package Format

tl;dr: QOP is a super simple archive format. Single header, MIT licensed, source on github: https://github.com/phoboslab/qop

QOP archives consist of three parts:

The index doesn't need to store the path. A fixed size hash of the path is all we need. qop_find(q, "images/title.qoi") hashes the supplied path and looks up that hash in the index.

The whole QOP file format is best described as a C-ish struct:

struct {
    // Path string and data of all files in this archive
    struct {
        uint8_t path[path_len];
        uint8_t bytes[size];
    } file_data[];

    // The index, with a list of files
    struct {
        uint64_t hash;
        uint32_t offset;
        uint32_t size;
        uint16_t path_len;
        uint16_t flags;
    } qop_file[];

    // The number of files in the index
    uint32_t index_len;

    // The size of the whole archive, including the header
    uint32_t archive_size; 

    // Magic bytes "qopf"
    uint32_t magic;
} qop;

The path of each file is still stored (to enable unpacking of the archive), but it sits in front the data of each file. This layout makes it nice to look at in a Hex-editor and we still have a fixed size for each index element.

Having the header at the end of the file not only makes it easy to find an archive that is concatenated to an executable, but also allows you to append files to the archive without needing to rewrite it completely: just cut the existing header, append your files, paste the header to the end, add the new files to the index and adjust index_len and archive_size.

Since the hash of each path is stored in the index, the obvious way to look up a file is to build a hash table. Which is exactly what qop.h does. I'm just using linear probing here. With a table size of at least 1.5x of the stored elements this is plenty fast and as simple as it gets.

Since I didn't want the QOP library to allocate any memory on its own, opening a QOP archive is a three-step process:

  1. find and read the fixed size header of the archive
  2. allocate memory for the index
  3. read the index

In practice, opening and reading a file from a QOP archive looks like this:

// Open the archive
qop_desc qop;
qop_open("archive.qop", &qop);

// Read the index into supplied memory
qop_read_index(&qop, malloc(qop.hashmap_size));

// Find a file
qop_file *file = qop_find(&qop, "qop.h");
assert(file);

// Load the file contents
unsigned char *contents = malloc(file->size);
qop_read(&qop, file, contents);

Have a look at the example.c for a more verbose version with error checking. All library functions are documented in qop.h.

The QOP file format is not finalized yet. One thing I should probably change is the uint32_t limit for files sizes. If you have any other thoughts on the format, please let me know on github.

Single File Game Releases

high_impact has the notion of a “platform” – a bunch of functions that interact with the OS & hardware. Currently two platforms are implemented: SDL2 & Sokol. Ultimately everything in high_impact goes through one of these. Loading files (images, sounds, …) is no exception: they all go through platform_load_asset().

With this in place, implementing QOP is quite straight forward. Previously platform_load_asset() looked like this:

uint8_t *platform_load_asset(const char *name, uint32_t *bytes_read) {
    char *path = strcat(strcpy(temp_path, path_assets), name);
    return file_load(path, bytes_read);
}

Now, to load a file from a QOP archive (that has already been opened), with a fallback to the filesystem:

uint8_t *platform_load_asset(const char *name, uint32_t *bytes_read) {
    // Try to load from the QOP archive first
    if (qop.index_len) {
        qop_file *f = qop_find(&qop, name);
        if (f) {
            uint8_t *data = temp_alloc(f->size);
            *bytes_read = qop_read(&qop, f, data);
            return data;
        }
    }

    char *path = strcat(strcpy(temp_path, path_assets), name);
    return file_load(path, bytes_read);
}

That, and a simple qop_open(argv[0], &qop) at program start is enough to make high_impact games load all assets from a QOP archive appended to the executable.

The Makefile for Biolab Disaster & Drop contains targets for sdl_release and sokol_release to compile the respective version, create a QOP archive and concatenate this archive to the executable. This executable is now self-contained – it's the only file you'd need to share when you want to distribute your game.

(Well, in the case of the Sokol build anyway. The SDL2 build still requires the SDL2.dll to be present; we would need to statically link it instead.)

Of course you don't need to do this to distribute your game. You could also load all assets from a separate data.qop or just use plain files as before. But the fact that you can build a self-contained executable is neat!

A Word of Caution

As Ashley Gullen remarked on twitter, appending data to the end of an executable used to trip up antiviruses. I don't know if that's still the case.

If you're one of the unfortunate souls that still have to use Windows for one reason or another and you find your antivirus complaining, please tell the antivirus vendor to fix their shit!

I'm only half joking here. Antivirus vendors have such a poor track record that I can't muster any sympathy for the vendor and only a little more for the user. This is something that ought to work and I refuse to not do it because of some overreaching companies.

© 2024 Dominic Szablewski – Imprint – powered by Pagenode (5ms) – made with <3