PHOBOSLAB

Voidcall – Making Of

Like last year with Underrun, I participated in this year's js13kGames – a JavaScript game development competition with a file size limit of 13kb, including code, assets and everything else. My entry was Voidcall, a Real-time Strategy game.

Voidcall Play Voidcall – A WebGL Real-time Strategy game in 13kb of JavaScript

Recently, I played the original Command & Conquer (later dubbed "Tiberian Dawn") again. I spent a whole summer of my childhood exploring the game in a friends basement. Compared to later real-time strategy games it's quite simple, yet very compelling. I set out to capture a bit of the game's essence for js13k.

Squeezing everything I wanted for this game into the tiny size of 13kb was a tough challenge. Much more so than last year. The technology is quite ambitious: a 3d rendered height map, fully polygonal and textured models with smooth keyframe interpolated vertex animations, comprehensive mouse control with accurate picking and solid pathfinding. It's a lot.

This article presents the various pieces of the puzzle with in-depth explanations. Everything is open source, so you can follow along on github.

Terrain Generation

Initially I thought about implementing the game in a simpler Canvas2D rendered view, but realized that I would need a lot of different graphics for the map (and much more for animated characters). Using a 3d rendered height map was the easier way to get a good looking terrain on the screen.

Perlin Noise Collision Map The collision map: 2-octave perlin noise, converted to 1bit

An initial 256x256 Canvas2D image is generated using a perlin noise function. This function produces random values with a smooth gradient for nearby pixels. The key is to layer several perlin noise values with different frequencies on top of each other. In this case each pixel is the sum two perlin noise values. You can create the broad features of the terrain with just one value at a low frequency and add finer detail with a second one at a higher frequency.

for (let y = 0, i = 0; y < MAP_SIZE; y++) {
    for (let x = 0; x < MAP_SIZE; x++, i++) {
        let height = 
            noise(x/35, y/35) + // low frequency
            noise(x/50, y/50); // high frequency
    }
}

While the result is visually already quite pleasing, it's not of much use for the game I wanted to build. I needed discrete distinction between walkable and un-walkable terrain for the pathfinding to work properly. So this image is converted to just 1 bit with a carefully chosen threshold.

This 1 bit image becomes the initial collision map for the game logic. For the height map that is used for the graphics this collision map is further processed. The first step is a bit of blur, so we get some smooth edges around the hills again. After that a bunch of high frequency perlin noise is layered on top.

Perlin Noise Height Map The height map: blurred and detailed with high frequency perlin noise

Unfortunately, the Canvas2D API does not provide a cross-browser blur function. Applying a box blur on each pixel using a convolution filter is not only computationally expensive, but also requires a bit more code than I wanted to spend on this. My solution was just to draw the height map on top of itself a bunch of times:

// Cheesy blur
ctx.globalAlpha = 0.2;
for (let x = -2; x < 2; x++) {
    for (let y = -2; y < 2; y++) {
        // Draw the canvas onto itself with the x,y offset
        ctx.drawImage(canvas, x, y); 
    }
}

For the details a number of perlin noise values are computed and mashed together until it looked right. The higher frequency noise is applied more strongly to the hills and less so to the valleys.

for (let y = 0, i = 0; y < MAP_SIZE; y++) {
    for (let x = 0; x < MAP_SIZE; x++, i++) {
        let
            coarse = px[(y*MAP_SIZE+x)*4],
            mid = noise(x/12, y/12) * 8, 
            fine = 
                noise(x/30, y/30) * 4 +
                noise(x/10, y/10) * 2 +
                noise(x/5, y/5) + 
                noise(x/3, y/3),
            height = 
                coarse * 6 + // Basic map height from collision
                fine * 300 + // Details on everything
                mid * 5 + // Mids on everything
                (coarse * (mid+5)) * 2.5 + // Extra mids on hills
                (coarse * (fine+1)) * 1.5 + // Extra details on hills
                ((255-coarse) * mid*-1)*0.5; // Inverted mids in valleys
    }
}

Now that we have the height map, we can create the 3d geometry. For each pixel of this height map a quad (2 triangles) is created and pushed to the render buffer. Each quad is randomly textured with one of two different grass-like tiles – just enough to make it not look repetitive.

The normal vector (the vector perpendicular to the surface) is computed once for each quad. This is later used for light calculations in the shader. The same normal vector is initially stored for each of the 4 vertices of the quad. This results in a flat shaded look.

The final buffer for the height map contains 131k triangles consisting of 393k vertices, each described by 8 Float32 values. In total about 12 MB of data - which may sound like a lot, but your GPU laughs at these numbers. I'll talk more about how this is actually rendered later.

Perlin Noise Terrain Flat shaded terrain, without smooth normals. Interesting, but not what I was going for.

Computing smooth normals turned out to be a particularly hairy problem to solve in this very size constrained code. The solution I ended up with is to walk through the otherwise final buffer, gather all normals from adjacent vertices, average them and put the averaged normals straight back into the buffer. This in place editing of the normal vectors produces quite terse code, but is otherwise horrible in every aspect.

The result of all this is stored in a gl.STATIC_DRAW buffer.

For the final detailing a few thousand models of rocks and trees are randomly sprinkled over the map. The 1 bit collision map is used, so that these models are only placed in the valleys.

It was important for the mouse picking to work correctly to have just the height map without trees in it's own buffer. So all detailed models combined are stored in second gl.STATIC_DRAW buffer, separate from the height map geometry.

Perlin Noise Terrain The final Perlin Noise generated terrain, with smoothed normals and detail models

Though the terrain generation is based on randomness, it was important for the gameplay that you always end up with the same terrain each time you load the game. So using JavaScript's built-in Math.random() would not do. Instead, the game makes use of a very simple seedable random number generator. Initializing it with the same seed value before generating the terrain always produces exactly the same result.

A fun problem I encountered with my perlin noise function was that it still produced different results in different browsers. To initialize the perlin noise function an array of sequential values is shuffled using JavaScripts built-in Array.sort:

p.sort(function() {
    return 0.5 - random_float();
});

While random_float() always correctly produced the same sequence of numbers in all browsers, browsers still implement the sort algorithm differently. So in Chrome you will end up with a different shuffled array than in Firefox. Of course MDN warns you about this:

The time and space complexity of the sort cannot be guaranteed as it depends on the implementation.

It's not exactly the right way to shuffle an array with an nonsensical comparator function anyway. Lesson learned. (The right way is to walk through the array and swap each element with a random other element).

For the final game I just plugged in different seed values until I found a terrain that was interesting and suitable for the gameplay. Not all terrains generated with this method produce usable results. Sometimes one half of the map is completely cut off from the other half. Sometimes "islands" of unreachable terrain are generated.

Perlin Noise Terrain Terrains generated with different seed values

Even though the terrain I initially settled on was quite nice, I later discovered that an area that I thought was reachable in fact wasn't. I didn't want to search for a new terrain and carefully place all game objects again. So I just fudged it by drawing a rectangle on a certain spot on the map to make a traversable path. It's the nasty secrets of game development.

The final code for the terrain generation is all in map_generate().

3D Models, Textures & Vertex Animations

A lot of work has gone into making the animated models and compressing them down to a reasonable size.

I'm still a doofus when it comes to 3d modeling, but the amazingly ingenious Wings3D once again saved the day. It works great with low-poly models. Turns out low-poly modeling is a lot like creating low-res pixel art: you just push things around until it looks right.

Low Poly Character Model The main character model – 36 vertices, 62 faces, 6 animation frames, compressed to 572 bytes

The main character model is as low-poly as you can possibly get with a convincing humanoid. All limbs are just triangular and end in a single point. Still, as you previously saw with the terrain, 3d data can get absolutely huge. Even for this model, if we'd store each of the 62 faces with 3x3 Float32 values, we'd end up with 2kb of data - for a single animation frame.

Now, the character model contains 6 animation frames. I looked into skeletal animations, but this would have required a lot of code to get working. So instead I opted for the simplest possible solution: vertex animations. I.e. each frame of an animation contains the whole model again.

So the character model contains a full set of vertex positions for each of the 6 animation frames, but still compresses to just 572 bytes.

Animation Frames The six different animation frames: 4x run, 1x shoot, 1x idle

There's two techniques I used to get there. The first one is just common sense (and used by almost any model data format): each of the 36 vertices is shared by multiple faces. So we just store the x,y,z position for each of the 36 vertices once and use an index into the vertex positions for each of the 3 corners of a face.

Furthermore, if we don't change the topology of the model in each animation frame, we just need to store the vertex positions again. The set of indices for the faces remains the same.

If we restrict ourselves to a maximum of just 255 vertices, we can store each index as a single byte. Similarly, if we convert the Float32 x,y,z vertex positions to a single byte each, we can save 9 bytes per vertex. "Compressing" arbitrary precision numbers down to a set of discrete values (in this case 0–255) is called quantization.

So with the x,y,z position as one byte each, we need 3 bytes to store each vertex. Similarly, with each face using 3 x 1 byte indices (one for each corner) into the vertex data, we need 3 bytes to store a single face.

36 vertices * 6 animation frames * 3 byte = 648 bytes
 +
62 faces * 3 byte = 186 byte
 =
834 bytes

Not too bad, but still too much.

I noticed that with models this simple, we don't actually need the 1 byte "precision" of 256 discrete values for the x,y,z positions. We can lower the resolution even more.

The next logical step is to pack all 3 values into just 2 bytes or 16 bit. This leaves us with 5 bits for each x,y,z value. (Technically 5.333 bits per value. We could use the extra 1 bit to give one of the x,y,z axis more resolution, but here I didn't. The extra 1 bit is just unused.)

5 bits gives us values from 0–31. So each of the vertex positions is essentially snapped into a grid of 32x32x32 values.

Of course I wanted to compress the face data similarly, but with values in the range 0–31 we wouldn't be able to address all 36 vertices. Luckily I noticed that Wings3D exports the face data with the vertex index for the first corner of each face in ascending order:

f 1// 22// 2//
f 2// 3// 1//
f 2// 24// 18//
f 3// 4// 1//
f 3// 8// 6//
f 4// 23// 17//
f 5// 3// 6//
…
f 32// 35// 33//
f 33// 26// 32//
f 33// 34// 26//
f 33// 35// 34//
f 34// 32// 27//
f 36// 30// 29//

The exported model contains one row per face, 3 indices into the vertex data. Notice how the first index is in ascending order and increases by 2 at most.

The logical conclusion is to store a 2 bit address increment for the first vertex index of each face and use 2 x 7 bit numbers for the two remaining indices. This allows us to store models with up to 127 vertices. Plenty!

Of course we need to store a bit of metadata with each model: the number of vertices, faces and animation frames. Described as a C-struct, the final model format looks like this:

// Retarded Model Format (.rmf):
struct {
    uint8_t num_frames;
    uint8_t num_verts; // per frame
    uint16_t num_indices;
    struct {
        uint8_t reserved : 1;
        int8_t x : 5;
        int8_t y : 5;
        int8_t z : 5;
    } vertices[num_frames * num_verts];
    struct {
        uint8_t a_address_inc : 2;
        uint8_t b_index : 7;
        uint8_t c_index : 7;
    } indices[num_indices];
} rmf_data;

Of course Wings3D doesn't have an exporter for this esoteric format. Instead I exported my models in the Wavefront OBJ format. This format contains easily parsable plain text data. A simple PHP script is used to convert the OBJ format into this binary format.

The game contains a total of 10 different models. All those are concatenated into a single file for easier loading and better ZIP compression, totaling in 1328 bytes. The model format is so terse that it barely ZIP-compresses (1203 bytes zipped).

All Voidcall Models From left to right: tree, boulder (re-used for gibs and energy well), blade of grass (yes, really), waypoint, unit selection bracket, ship, harvester, turret, enemy, unit.

The models then get textured, each with a single tile of the texture image.

Textured Model All the textures used in the game, 1643 bytes

Because my model format doesn't contain any texture coordinates I opted for a very simple hack to lay textures over the models: the tile is just projected from the front onto the model. Essentially the x,y coordinates of each vertex are the texture coordinates.

Textured Model Textured Model, with the texture projected directly from the front

Notice how the feet and hands of the untextured model are very pointy, ending in a single vertex. Using transparent pixels in the textures for all unit types the hand and feet are cut off, expect for the military unit where a black texture indicates the gun.

To blend between the different animation frames the game just linearly interpolates between the vertex positions of the last and current frame. The frame interpolation, as well as unpacking of the face indices into vertex positions and Y-rotation of the models is all handled in JavaScript.

The final code to load and render a model clocks in at just 150 lines. It's wholly unoptimized (e.g. loaded models are not cached) but it just doesn't matter for these kinds of poly-counts.

Mouse Picking

My game requires accurate mouse controls. I need to know exactly where the mouse cursor is, not in screen space, but in world space. Typically you'd project a ray from the mouse cursor, through the screen and onto the 3d geometry. This requires quite a bit of math that I so far avoided throughout the project. Plus, you'd need a hit-test function on the height map.

Another technique – one that has fallen out of favor for modern games – is to render a specially encoded view of the game and read back the pixels. For instance, you could render each object that you can hit in a different color, then read back the pixel color under the mouse cursor and determine which object you have hit.

For my game I wasn't really interested in any particular object, but where exactly the mouse cursor is on the height map. If I know that, I can easily determine the objects nearby in JavaScript.

Mouse picking World position (x, y, z) encoded in RGB colors

The game renders just the height map, nothing else, with a simple shader that encodes the x, y, z coordinates in RGB colors. This leaves us with a resolution of just 0–255 in each dimension, but as it turned out the low resolution is not really noticeable in game. The game space is divided into a tile grid anyway, so the accuracy to hit a single tile is plenty.

The shader looks like this:

void main(void) {
    gl_FragColor = vec4(
        vertex_pos.x / 512.0 + 0.5,
        vertex_pos.y / MAP_HEIGHT,
        vertex_pos.z / 512.0 + 1.2,
        1.0
    );
}

The x, z offset and the divisions are carefully selected for the normal camera distance. The Z-axis is just divided by the maximum height of the map. If the game would allow you to zoom out, we'd need to account for this in the shader. However, the game's zoom level always stays the same, so these values can be hardcoded.

Of course these RGB colors are offset by the current camera position, so we have account for that when determining the mouse position:

// Read a single pixel at the current mouse position
let px = new Uint8Array(4);
gl.readPixels(
    mouse_x, 
    screen_height - mouse_y, // damn opengl and it's bottom left 0 position
    1, 1, gl.RGBA, gl.UNSIGNED_BYTE, px
);

// Calculate world coords
mouse_world_x = (px[0] / 255 - 0.5) * 512 - camera_x; // red = x
mouse_world_y = (px[1] / 255) * MAP_HEIGHT;           // green = y
mouse_world_z = (px[2] / 255 - 1.2) * 512 - camera_z; // blue = z

Allow me to complain about OpenGLs coordinate system. OpenGL has it's 0,0 position in the lower left corner of screen, because apparently it's more scientific this way. It seems trivial to just invert everything, but the number of times and the hours I spent dealing with this just leaves me bitter. Whether it was for Ejecta, simple things like my WebGLImageFilter or for this game – you have to flip the screen and your coordinates ALL THE FKING TIME. I hate it.

This way of mouse picking – implemented with another render pass – is discouraged these days because you typically stall your program while waiting for gl.readPixels() to render the view and return the color buffer, instead of letting the GPU do its thing and compute the next frame. It just absolutely kills performance. But as with so many other things for this game, it just doesn't matter. Computers are fast.

Rendering

The renderer borrows most of its code from my last year's entry Underrun. Underun pushed everything into a single buffer and rendered the whole game in just one draw call. For Voidcall I needed to divide that into several draw calls, but the overall structure remains the same.

The terrain geometry is stored in a separate static buffer and first rendered in faux-colors for the mouse picking described above.

Afterwards the terrain is rendered again with a special shader that desaturates the texture color based on the y position. This nicely "blends" the grass texture into gray-ish stone for the hills. On top of that follows another static buffer with all decorative geometry, trees and bolders.

A dynamic buffer is used to render all the models, then flushed and re-used for another pass for the shadows.

The object shadows are just quads with a shadow blurred shadow blob as texture. But because they are transparent, we need to render these on top of all the other objects, with the depth buffer disabled. Otherwise the semi-transparent pixels of the shadow texture would hide all objects below them.

Voidcall render passes Four draw calls produce the final image

The game allows for a maximum of 16 dynamic lights. They work basically the same as in Underrun, but this time the light calculations are done in the pixel shader, instead of in the vertex shaders. This gives us a smoother look, compared to the low-fi visuals of last year's game.

The ingame "chat" of your units is just an HTML element overlayed over the canvas. I didn't want to deal with text-rendering inside of WebGL. After all we are running in a browser here, so why not use it?

The code for the renderer can be found in the aptly named renderer.js.

Pathfinding

Voidcall uses a pretty standard A-Star algorithm for pathfinding. To optimize the performance a bit my A-Star implementation allocates all neccessary data up-front. It never has to allocate any data during run-time.

Instead of storing the set of visited nodes and their costs in objects pushed into a plain JavaScript arrays, all state information is stored in a number of separate TypedArrays. To make this work, my implementation works directly with the "addresses" of nodes instead of their x, z position. The address is just z * MAP_SIZE + x – i.e. the index into these typed arrays. Consequently, this also reduced the code size quite a bit.

let 
    nodes_parent = new Uint16Array(MAP_SIZE * MAP_SIZE),
    nodes_state = new Uint8Array(MAP_SIZE * MAP_SIZE),
    nodes_g = new Float32Array(MAP_SIZE * MAP_SIZE),
    nodes_f = new Float32Array(MAP_SIZE * MAP_SIZE),
    open_nodes = new Uint16Array(MAP_SIZE * 4), // Should be enough!

    NEIGHBORS = [
        -1-MAP_SIZE, -MAP_SIZE, 1-MAP_SIZE,
        -1,                     1,
        -1+MAP_SIZE,  MAP_SIZE, 1+MAP_SIZE
    ],

    STATE_UNKNOWN = 0,
    STATE_OPEN = 1,
    STATE_CLOSED = 2;

while (num_open_nodes--) {
    let current_addr = open_nodes[num_open_nodes];for (let i = 0; i < NEIGHBORS.length; i++) {
        let neighbor_addr = current_addr + NEIGHBORS[i],

        // Compute cost: f, g.
// Store node data
        nodes_parent[neighbor_addr] = current_addr;
        nodes_state[neighbor_addr] = STATE_OPEN;
        nodes_g[neighbor_addr] = g;
        nodes_f[neighbor_addr] = f;
        num_open_nodes++;

        // Insert into open_nodes
}
}

Of course, to compute the cost for each node we need to get the x, z position from the address again, which is simply:

let x = address % MAP_SIZE,
    z = (address / MAP_SIZE)|0;

Or, as I knew my MAP_SIZE is 256, I can just use some bit-twiddling instead of costly divisions:

let x = address & 0xff, // low byte
    z = address > 8;    // high byte

AStar Pathfinding All visited nodes (white) and the found path (blue)

In a final step the found path is condensed to the minimum number of waypoints: For each waypoint all subsequent waypoints are removed if they are visible across the map until we find one that is not. This eliminates extraneous waypoints that would result in stiff movement in a checkerboard fashion.

AStar Pathfinding The condensed set of nodes necessary to reach the target

With this all, plotting a path across the whole map takes about 1.5ms. Still costly, but good enough. The complete implementation fits in just 120 lines of code.

Sound & Music

Just like last year, I used the brilliant Sonant-X library for sound and music. My dear friend Andreas Lösch of no-fate.net once again produced an amazing soundtrack in Sonant-X Live. Likewise, the few sound effects also use this library.

I did however overhaul the library; almost rewriting it. Instead of using generator functions for the sine, square, sawtooth and triangle oscillators everything is precomputed into a single lookup table. This tremendously sped up the time needed to render the soundtrack. It allowed me to remove all of the asynchronous setTimeout() calls that were originally implemented to make the page more responsive while loading.

The whole library is now about 250 lines of code and generates the 120 seconds of music in just 900ms – down from 5900ms for the original library.

Gameplay

Admittedly the gameplay is a bit of a weak spot. With all the tech implemented I had very little time to tune the actual game mechanics. The game is probably a bit too hard, even if you know exactly what to do. Worse, most players are initially totally lost. The game really could have used a bit of tutorial or maybe just a slower start to give players more time to get familiar.

Cursor controls stress test with a lot of units

I'm pretty proud of how the mouse controls turned out: selecting single units, dragging selection areas, ordering them around, building turrets and harvesters all work pretty much like in any other full-blown RTS. The code to get this working however is a big mess of spaghetti. And it's not feature-complete either.

You can't order one of your grunts to attack a specific enemy. Instead, they will only attack the closest one. Similarly, your medic will only heal the unit closest to him – even if that unit has full health already and another unit that really needs healing is in range. I simply ran out of time and space to implement these things.

The logic to spawn enemies and their AI is pretty bare bones, too: Enemies spawn in decreasing time intervals, depending on the total amount of energy you're creating. A random position outside the map is chosen for the spawn point. The enemy then walks down the hills to the valley and searches for the nearest player unit or building. The initial random target for the enemy can be overridden if the enemy is attacked. It will then move towards the attacker. That's it.

The game has a bit of animation when you successfully get your power levels up and return with all (remaining) units to the ship. I doubt many people have seen it.

Minification

Many of the same techniques as in Underrun were used to compress the game. This time however I had to work harder to keep the game's size under the 13kb limit. The uncompressed code weighs in at about 96kb, so minifying was absolutely crucial.

Some random observations:

The final game consists of just three files: the textures PNG, the models in binary format and an index.html with all the code. The PNG and models are just about 2kb; the rest, about 11kb, is all code.

Voidcall code The complete minified code just about fits on my screen in the default font size

The full uncompressed source for Voidcall is on github: github.com/phoboslab/voidcall

Fun fact: this article contains 28000 characters – more than twice as much as was allowed for the game. It does ZIP-compress to just 10kb, though.

MPEG1 Single file C library

If you want to play videos in a native app or game you basically have three options:

  1. Rely on the API that the platform provides (e.g. AVPlayer on macOS & iOS)
  2. Include a giant, multi-megabyte library like ffmpeg or all of libwebm, libvpx, libogg, and libvorbis
  3. License Bink Video from RAD Game Tools – which is what most game studios seem to do these days

A few weeks ago I had this conversation on Twitter, where I offered a possible fourth solution:

Tweets

(Link to this thread on Twitter)

MPEG1 is a very old format dating back to 1993, but the quality and compression ratio still holds up surprisingly well. Especially for games I would imagine it is still good enough for displaying some company logos or in game "video calls" from your AI team mates.

Because MPEG1 is such an old format, decoding it costs very little CPU time compared to more modern video formats. This is important for games running within a tight time budget.

Since I already had some experience with implementing an MPEG1 decoder in JavaScript and later porting parts of it to WASM (via C), a standalone plain C library would be a no brainer. So I proposed to build one, modeled after the excellent stb single-file libraries – dependency free libraries that come in a single C header file, easily embeddable in your projects.

Now that Sean Barrett, the original author of the stb libs (and coincidentally employed at RAD Game Tools, the makers of Bink Video) and John Carmack gave their thumbs up, I had little excuse not to do it.

It took me way longer than I expected, but I managed to pack a fully working MPEG1 Video Decoder, MP2 Audio Decoder and MPEG Program Stream Demuxer, all wrapped up in an easy to use API into roughly 3000 lines of code in a single C header.

PL_MPEG player demo

A video player example utilizing SDL and OpenGL is included in the repository. OpenGL is primarily used for the color space conversion from YUV to RGB. This is done in a fairly simple shader which cuts the decode time in half.

Of course you can also do the color space conversion in C and get a raw RGB buffer, as demonstrated in the even simpler extract frames example.

Implementation Details

Debugging a video decoder is fun. You get to see exactly where you screw up.

Corrupted video from an off-by-one error in the demuxer

Debugging an audio decoder will make your ears explode. I'll spare you an example.

I usually don't write much C, but I enjoy working within the limits of the language quite a lot. There's one particular design pattern in C that I always come back to:

typedef struct obj_t obj_t;

obj_t *obj_create();
obj_method_a(obj_t *self);
obj_method_b(obj_t *self);
obj_destroy(obj_t *self);

The obj_t struct is opaque. Its members are only defined in the implementation of the library, but invisible to the library's users. Every type get its own _create() and _destroy() function. Using this pattern throughout the library makes reasoning about it very straight forward. It's as simple as it gets, leaves no questions about memory management and neatly hides the implementation details.

In PL_MPEG some of these object _create() functions take byte arrays or other objects as a parameter. I wanted to make it very clear who has ownership of these parameters after the call. There's good reasons for either option: a) you keep ownership of what you pass in and free the memory yourself, or b) you hand over ownership and let the library clean it up when it's no longer needed. So these functions in PL_MPEG spell it out with an extra free_when_done flag:

plm_t *plm_create_with_buffer(plm_buffer_t *buffer, int destroy_when_done);

PL_MPEG exposes some lower level interfaces to its buffer, video decoder, audio decoder and demuxer functionality, but also provides an easy to use wrapper that combines all those. With this wrapper, loading and decoding video and audio is as simple as this:

#define PL_MPEG_IMPLEMENTATION
#include "plmpeg.h"

// This function gets called for each decoded video frame
void my_video_callback(plm_t *plm, plm_frame_t *frame, void *user) {
    // Do something with frame->y.data, frame->cr.data, frame->cb.data
}

// This function gets called for each decoded audio frame
void my_audio_callback(plm_t *plm, plm_samples_t *samples, void *user) {
    // Do something with samples->interleaved
}

// Load a .mpg (MPEG Program Stream) file
plm_t *plm = plm_create_with_filename("some-file.mpg");

// Install the video & audio decode callbacks
plm_set_video_decode_callback(plm, my_video_callback, my_data);
plm_set_audio_decode_callback(plm, my_audio_callback, my_data);

// Decode. Typically called in your event loop, once per frame
do {
    plm_decode(plm, time_since_last_call);
} while (!plm_has_ended(plm));

// All done, free memory
plm_destroy(plm);

The full documentation for the library can be found in the pl_mpeg.h.

Producer & Consumer in C

There's one implementation detail in PL_MPEG that I didn't find an elegant solution for: the synchronization of producers and consumers.

In PL_MPEG The demuxer reads a buffer or file and spits out packets of video and audio data. For the MP2 audio decoding we know exactly how many bytes we need to decode one frame (1152 samples) of PCM data. So the audio decoder asks the buffer once before decoding if enough data is available and just bails otherwise:

plm_samples_t *plm_audio_decode(plm_audio_t *self) {
    if (!plm_buffer_has(self->buffer, self->frame_data_size)) {
        return NULL;
    }
    // Continue decoding…
}

plm_buffer_has() will attempt to load more data if needed. Audio frames are small, so demuxing a whole frame at once is not a problem.

The MPEG1 video format makes this more complicated. There's no information in the frame header that tells us how big the video frame is. There's probably a reason why the byte size of a video frame is never stated in the muxed packet header or video frame header, but it's beyond my comprehension.

Ideally we would just demux all video packets needed for a single frame and then run the decoder. After all, memory is cheap and we can easily scale our buffers accordingly. Of course it makes sense that early MPEG decoders with tighter memory constraints didn't do this and instead ran the demuxer and decoder in parallel. However, I wanted to keep PL_MPEG simple and single threaded. So all we can do is to continually ask the buffer if a byte is available (or can be loaded) before reading it. This scatters load calls for the buffer all over the decode code and forces lots of switches between two tangentially related contexts.

This also introduces another problem: if we're in the middle of decoding a video frame and the buffer doesn't have any more bytes yet (e.g. because it's streaming from the net) we would need to pause the decoder, save its exact state and later, when enough bytes are available, resume it again. Of course this isn't particularly difficult to achieve using threads, but if we want to stay single threaded it gets very hairy.

This article by Simon Tatham explains the problem nicely and does provide a solution for synchronizing two simple loops. Sadly, our video decoder could bail anywhere in the call stack, a few functions deep. So what we really need are coroutines, such as those natively provided in Golang. Some coroutine implementations for C exist, but they are quite large and/or require platform specific assembly, making it unsuitable for a small header only lib.

So currently, if you want to feed a plm_buffer() yourself through a plm_buffer_load_callback and you can't guarantee that the data can be loaded in a timely fashion you have two options:

  1. Run PL_MPEG in its own thread and just busy-wait it in your load callback until enough data is available
  2. Search through all available bytes until you find the PICTURE_START_CODE of the next frame – making sure that the previous frame can be completely decoded before calling plm_video_decode()

Of course with the second solution you introduce one full frame of lag for streaming video. If latency is important that leaves only one option.

Interestingly, if I interpret the source correctly, ffmpeg chose the second option (waiting for the next PICTURE_START_CODE) even for the MPEG-TS container format, which is meant for streaming. So demuxing MPEG-TS with ffmpeg always introduces a frame of latency.

This gross oversight in the overengineered (especially for its time) MPEG-PS and MPEG-TS container formats just leaves me dumbfounded. If anybody knows why the MPEG standard doesn't just provide a byte size in the header of each frame or even just a FRAME_END code, or if you have a solution for this problem, let me know!

Rewriting Pagenode

Today I released Pagenode – a project that I started 14 years ago. Pagenode began its life as a full-fledged Content Management System and now, after countless rewrites, became a simple library. Pagenode's journey mimics my own as a developer. Its current iteration expresses my desire for simplicity.

In 2004, after dabbling a bit with PHP and finally grasping MySQL I set out to build my own CMS. I previously looked at a lot of different CMSes on the market and found all of them to be lacking – some in code quality and overal structure (Wordpress comes to mind), others just felt overblown even back then and sported lots of weird "features" (e.g. Typo 3).

I wanted a simple, unified way to manage content. A single tree of nodes of different types, each derived from a base node type. Extensibility was paramount.

It took a few attempts to get it working. The way I structured my code back then looks embarrassingly amateurish to me now, but I learned a lot. I was especially proud of how my CMS stored the content tree in the database by using a Nested Set, complete with support for cut/copy/pasting of subtrees.

Manipulating the Node Tree

Content in my CMS was written in a minimalist markup language. Somewhat similar to the then non-existent Markdown, albeit with a smaller scope and worse ideas.

Authoring Content

I added a bunch of more features that I thought were necessary at the time: a templating language, plugin system, localization support, etc. I also added a full fledged file manager, which was a lot of work on its own, but ultimately turned out useless – I uploaded most things via FTP and there was seldom any need for copying/moving/deleting files.

File Manager

My CMS was almost complete and, as far as I was concerned, better than anything else on the market. I named the thing Pagenode and set out to release it "soon".

That release never came, but Pagenode still served my own needs nicely for more than a decade. The site for my HTML5 game engine (impactjs.com), my earlier gaming related site (chaosquake.de), this blog (phoboslab.org) and some others all ran on Pagenode.

During the years, it became apparent to me that some types of content just don't fit in a tree. My workaround was to create a new node type that hosts and manages the content by itself, instead of directly in the tree. It worked well enough, but eventually I had to re-think the whole approach.

Earlier in 2017 I started to think about re-writing Pagenode. I had learned a lot in the past decade and my disdain for complexity has only grown. There were a lot of things in Pagenode I could now do more elegantly, with less code in a clearer way.

So I re-wrote Pagenode.

I still wanted to use PHP and MySQL, because virtually every host supports it. PHP is still my "get shit done" language and in its not nearly as bad as people remember it.

My focus would be on collections of different types of content. No more tree like structure, but instead tagged nodes and a simple API to retrieve them. This allowed for a lot more flexibility, but still made the whole system a lot simpler.

With this narrow focus on collections of tagged nodes I soon noticed that I don't need a database. I could save nodes as JSON files, efficiently build and cache an index of all of them and filter directly in PHP.

So I re-wrote Pagenode again.

I build a system where you could define your own types by just sub-classing the base node type:

class Article extends Node {
    const FIELDS = [
        'related' => Field_URL::class,
        'title' => Field_Text::class,
        'subline' => Field_Text::class,
        'titleImage' => Field_Image::class,
        'body' => Field_Markdown::class
    ];
}

This is all that was needed. The Admin interface would adapt automatically.

File Manager

I spent a lot of time on the text editor, making drag+drop upload work, implementing a file browser and more. I brought all the content from my blog over into the new Pagenode and it worked great.

My CMS was almost complete and, as far as I was concerned, more elegant than anything else on the market. I named the thing Pagenode and set out to release it "soon".

(This version of Pagenode can now be found on github as pagenode-legacy)

Then I noticed that I actually don't want to write my blog posts in a textarea in a browser, but rather use my favorite editor. I also wanted versioning. And while I still could put everything in a git repository, work on my blog locally and then push to my server, nothing of the administration interface that I built would help me do that.

So I re-wrote Pagenode again.

This time Pagenode is just a library to load, select and filter plaintext content from disk. Nothing more. That's the version of Pagenode that I needed. It's extensible, it doesn't get in your way, it's fast, it's simple and it comes in a single PHP file.

Read more about Pagenode:

pagenode.org – official website and documentation

github.com/phoboslab/pagenode – source code

Underrun – Making Of

I participated in this year's js13kGames, a JavaScript game development competition with a file size limit of 13kb, including code, assets and everything else. My entry was Underrun, a twin stick shooter using WebGL.

Underrun Play Underrun – A WebGL shooter in 13kb of JavaScript

For this competition I set out to produce something with a dense atmosphere – which is inherently difficult to do with so little space. In my experience a dense atmosphere is most easily achieved with two things: moody lighting and rich sound & music. The assets for sound & music usually take up a lot of space and lighting for 2D games requires more and bigger graphic assets. With only 13kb to space I had to look for different solutions for these things.

Graphic Assets

Since I was very limited with the amount of graphic assets I could fit in this game the decision to implement a 3D perspective came naturally: atmospheric lighting is easier to do in 3D than in 2D and requires less assets to look neat. In contrast, to produce interesting light effects in 2D typically requires you to implement a third dimension anyway. This can be done through normal maps as some 2D Pixel Art games do or by explicitly separating your environment into different layers as Teleglitch does for example.

My approach for the graphics was to render a simple 2D tilemap with a bunch of textured quads. All walls are rendered as cubes and entities (the player, enemies, projectiles, particles and health pickups) are just simple sprites. All textures for the game fit in a single tile sheet, 2.12kb in size.

Underrun Assets All graphic assets for the game, carefully tuned to 16 colors

I opted to render the player character as a sprite as well, even though it needs to be freely rotated. It was far easier to produce 8 different sprites – one for each direction – than to implement a loader and renderer for complex 3D models. This allowed me to omit any 3D vector or matrix math operations. The game never needs to rotate any geometry. Everything is rendered with axis aligned quads.

To make the player character rotation look convincing I build a 3D model and simply took screenshots for each of the different perspectives. I'm a total doofus when it comes to 3D modeling software, but that doesn't stop me from using the ingenious Wings3D. The result doesn't look like much, but scaled down to 16px, it doesn't matter.

title The player character model built in Wings3D

The final game features 3 different levels, each 64×64 tiles in size. When I started to work on the game I considered to use Run Length Encoding to compress the level maps. However, even as simple as RLE is, a decompressor for it would have taken up some valuable space. So my solution was to just let the browser handle the decompression by using PNG images.

title Underrun's levels are stored as PNG images

With the PNG compression, each level image is just about 300bytes in size. A naive approach would have needed 64×64 bytes = 4kb per level (assuming storage as raw binary, 1byte per tile).

When loading the level a random number generator (RNG) is used to determine exactly which tile graphic to use for each floor and wall. The RNG also decides where to place enemies and powerups, as these are not encoded in the level images. To make this consistent between playthroughs, so that each player plays exactly the same game, I implemented a tiny seedable RNG (source code) and re-seeded it with the same constant before loading each level.

I also hardcoded some of the wall tiles to be rendered with twice the normal height to make it look more interesting. You can see the complete level loading routines in the load_level() function.

Rendering

The renderer follows an extremely simple design: a single buffer is used to store all vertices, texture coordinates and normals. A second buffer is used to store all light sources (position, color, attenuation). The game can write to these buffers using the push_quad() and push_light() functions. The renderer clears these buffers at the beginning of each frame and draws the (filled) buffers at the end of the frame.

There's one small optimization to this. During level load, since all the level geometry is static, the game sets a reset mark for the geometry buffer after all the level geometry has been pushed. Before each frame the renderer will then reset the write position for the geometry buffer to this reset mark instead of to 0. This clears all the entities, but we don't have to re-push the level data.

Since the game's graphics are quite simple, there's no need for any occlusion culling or any other optimizations to reduce the amount of geometry being drawn. Underrun draws the whole level – all floor an ceiling tiles and all sprites – for every single frame in a single draw call.

The gist of the renderer looks like this:

var 
    num_verts = 0,
    level_num_verts = 0,
    max_verts = 1024 * 64,
    buffer_data = new Float32Array(max_verts * 8); // 8 floats per vert


// Push a new quad into the buffer at the current num_verts position.
function push_quad(x1, y1, z1, ...) {
    buffer_data.set([x1, y1, z1, ...], num_verts * 8);
    num_verts += 6;
}

// Load a level, push some quads.
function load_level(data) {
    num_verts = 0;

    for (var y = 0; y < 64; y++) {
        for (var x = 0; x < 64; x++) {
            // lots of push_quad()...
        }
    }

    level_num_verts = num_verts; // set reset pos to current write pos
}

// Resets the buffer write position to just after the level geometry.
function renderer_prepare_frame() {
    num_verts = level_num_verts;
}

// Hand over the buffer data, up to num_verts, to webgl.
function renderer_end_frame() {
    gl.bufferData(gl.ARRAY_BUFFER, buffer_data, gl.DYNAMIC_DRAW);
    gl.drawArrays(gl.TRIANGLES, 0, num_verts);
};

// Run a frame
function tick() {
    renderer_prepare_frame();

    for (var i = 0; i < entities.length; i++) {
        entities[i].update();
        entities[i].draw();
    }

    renderer_end_frame();

    requestAnimationFrame(tick);
}

For the lighting I opted for some very simple vertex lights. Since the game only renders relatively small quads, it doesn't matter much that the light computation is handled in the vertex shader instead of per-pixel in the fragment shader. This also allowed for many more light sources. The game currently allows for 32 lights, each of which considered for every single vertex. Current GPUs are so stupidly fast that no optimizations are needed for such simple cases.

The light calculation in Underrun's Vertex Shader looks somewhat like this:

// 7 values for each light:  position (x, y, z), color (r, g, b), attenuation
uniform float lights[7 * max_lights];
varying vec3 vertex_light;

void main(void){
    vertex_light = vec3(0.3, 0.3, 0.6); // ambient color

    // apply each light source to this vertex
    for(int i = 0; i < max_lights; i++ ) {
        vec3 light_position = vec3(lights[i*7], lights[i*7+1], lights[i*7+2]);
        vec3 light_color = vec3(lights[i*7+3], lights[i*7+4], lights[i*7+5]);
        vec3 distance = length(light_position - vertex_position);
        float attenuation= (1.0/(lights[i*7+6] * distance);

        vertex_light += 
            light_color
            * max(dot(normal, normalize(dist)), 0.0) // diffuse
            * attenuation;
    }

    // ...
}

For some extra scary atmosphere the fragment shader calculates a simple black fog based on the distance to the camera. An exception is made to the enemies' eyes: all full red texture pixels are always rendered unchanged – no light, no fog. This makes them essentially glow in the dark.

The fragment shader also reduces the final image to 256 colors. I experimented with some color tinting but ultimately settled for a uniform rgb palette. Rendering in 256 colors and a screen size of just 320×180 pixels not only gives the game an old-school vibe, but also hides a lot of the graphical shortcomings.

Have a look at the source code for the renderer.js and game.js - there's really not that much to it.

Music & Sounds

The sound and music for all my previous games made up the bulk of the data size. E.g. Xibalba has about 800kb of graphic assets (tile sheets, sprites, title screen etc.) but more than 4mb of music and 2.5mb of sound effects (actually 8mb and 5mb respectively, since all assets need to be provided as .ogg and .mp3). This was of course infeasible when you only have 13kb total to work with.

So for Underrun all music and sound effects are generated in JavaScript when the game starts. The game uses the brilliant Sonant-X library to render instruments from a bunch of parameters into sound waves.

My dear friend Andreas Lösch of no-fate.net produced the instruments and sequences for the music in the tracker Sonant-X Live. The sound effects were produced in Sonant-X Live as well; each effect being a single instrument.

Sonant-X Underrun's Music, produced in the Sonant-X Live tracker

To save some more space I began to remove all the parts from Sonant-X that were not needed for my game, leaving only the WebAudio generator. In doing so I noticed that the library was unnecessary slow, taking half a second to generate an empty buffer object. The culprit was the use of a single, interleaved Uint8 Buffer for the left and right audio channel, storing unsigned 16bit values with the implicit zero amplitude at 32767.

The library was spending a lot of time loading from and storing 16bit values into this 8bit buffer and later converting it to a signed float suitable for WebAudio. I believe this is an artifact from the original use case of this library: producing a WAV data URI.

I refactored the library to use two Float32Arrays (one for each channel) that can be directly used by the WebAudio API in the browser. This not only simplified the code and reduced the file size by 30% but also made the music generation twice as fast.

As an example, have a look at the original applyDelay() function and my modified one - granted, I also removed the pausing/re-scheduling functions here that originally prevented blocking the event loop with a long running computation, but it wasn't really needed anymore.

All in all, minified and gzipped, the music and sound for Underrun along with the Sonant-X library now occupy only about 2kb. That's a pretty big deal, considering all my previous games' biggest asset were sound and music. So even if your game is not size restricted as much, it may make sense to generate audio directly in your game, instead of adding big files to the download.

Minification

When I started this project I took great care to write as little code as possible and to make sure all my code could be minified effectively by UglifyJS. The source has quite an unusual style compared to my other JS projects. It's using a lot of global variables and functions, instead of abstracting it more neatly into classes. I also wrote the code in the more C-like snake_case style to force my brain into this different mode.

In hindsight, this was totally unnecessarily. Minifying and zipping would have gotten rid of most of the overhead of using more abstractions.

One trick that made a small difference is the "inlining" of all WebGL constants at build-time. E.g. gl.ONE_MINUS_SRC_ALPHA is replaced with just 771 - the constant's value. I also shortened all the WebGL function calls by producing aliases that just contained the first two letters and all subsequent upper-case letters of the function's original name:

for (var name in gl) {
    if (gl[name].length !== undefined) { // is function?
        gl[name.match(/(^..|[A-Z]|\d.|v$)/g).join('')] = gl[name];
    }
}

This allowed me to use gl.getUniformLocation() as just gl.geUL(). Note though that this approach is quite hack-ish and produces some name collisions. But for my game it worked out nicely.

I also took a lot of shortcuts with the game logic and especially collision detection. Collision detection between entities is just using two nested loops, checking each entity against all other entities. This quadratic function quickly explodes with a lot of entities in the level (e.g. 100 entities = 10.000 comparisons) - it's just amazing with what you can get away with these days.

Other than that I didn't do anything particularly clever or exciting to further minify the game.

The full source for Underrun is on github: github.com/phoboslab/underrun

Be sure to have a look at some of the other js13Games entries this year. Some of my favorites are The Chroma Incident, 1024 Moves, ISP, Wander and Envisonator.

Impact Is Now Free & Open Source

My HTML5 Game Engine Impact launched almost 8 years ago. The last update was published in 2014. While Impact still works nicely in modern browsers, it lacks support for better graphic and sound APIs that are now available. I felt increasingly bad for selling a product that is hardly maintained or improved.

So as of today Impact will be available completely for free, published under the permissive MIT License.

Impact's source is available on github.com/phoboslab/impact

Thanks anyone who bought a license in the last few years!

Archive