PHOBOSLAB

MPEG1 Single file C library

If you want to play videos in a native app or game you basically have three options:

  1. Rely on the API that the platform provides (e.g. AVPlayer on macOS & iOS)
  2. Include a giant, multi-megabyte library like ffmpeg or all of libwebm, libvpx, libogg, and libvorbis
  3. License Bink Video from RAD Game Tools – which is what most game studios seem to do these days

A few weeks ago I had this conversation on Twitter, where I offered a possible fourth solution:

Tweets

(Link to this thread on Twitter)

MPEG1 is a very old format dating back to 1993, but the quality and compression ratio still holds up surprisingly well. Especially for games I would imagine it is still good enough for displaying some company logos or in game "video calls" from your AI team mates.

Because MPEG1 is such an old format, decoding it costs very little CPU time compared to more modern video formats. This is important for games running within a tight time budget.

Since I already had some experience with implementing an MPEG1 decoder in JavaScript and later porting parts of it to WASM (via C), a standalone plain C library would be a no brainer. So I proposed to build one, modeled after the excellent stb single-file libraries – dependency free libraries that come in a single C header file, easily embeddable in your projects.

Now that Sean Barrett, the original author of the stb libs (and coincidentally employed at RAD Game Tools, the makers of Bink Video) and John Carmack gave their thumbs up, I had little excuse not to do it.

It took me way longer than I expected, but I managed to pack a fully working MPEG1 Video Decoder, MP2 Audio Decoder and MPEG Program Stream Demuxer, all wrapped up in an easy to use API into roughly 3000 lines of code in a single C header.

PL_MPEG player demo

A video player example utilizing SDL and OpenGL is included in the repository. OpenGL is primarily used for the color space conversion from YUV to RGB. This is done in a fairly simple shader which cuts the decode time in half.

Of course you can also do the color space conversion in C and get a raw RGB buffer, as demonstrated in the even simpler extract frames example.

Implementation Details

Debugging a video decoder is fun. You get to see exactly where you screw up.

Corrupted video from an off-by-one error in the demuxer

Debugging an audio decoder will make your ears explode. I'll spare you an example.

I usually don't write much C, but I enjoy working within the limits of the language quite a lot. There's one particular design pattern in C that I always come back to:

typedef struct obj_t obj_t;

obj_t *obj_create();
obj_method_a(obj_t *self);
obj_method_b(obj_t *self);
obj_destroy(obj_t *self);

The obj_t struct is opaque. Its members are only defined in the implementation of the library, but invisible to the library's users. Every type get its own _create() and _destroy() function. Using this pattern throughout the library makes reasoning about it very straight forward. It's as simple as it gets, leaves no questions about memory management and neatly hides the implementation details.

In PL_MPEG some of these object _create() functions take byte arrays or other objects as a parameter. I wanted to make it very clear who has ownership of these parameters after the call. There's good reasons for either option: a) you keep ownership of what you pass in and free the memory yourself, or b) you hand over ownership and let the library clean it up when it's no longer needed. So these functions in PL_MPEG spell it out with an extra free_when_done flag:

plm_t *plm_create_with_buffer(plm_buffer_t *buffer, int destroy_when_done);

PL_MPEG exposes some lower level interfaces to its buffer, video decoder, audio decoder and demuxer functionality, but also provides an easy to use wrapper that combines all those. With this wrapper, loading and decoding video and audio is as simple as this:

#define PL_MPEG_IMPLEMENTATION
#include "plmpeg.h"

// This function gets called for each decoded video frame
void my_video_callback(plm_t *plm, plm_frame_t *frame, void *user) {
    // Do something with frame->y.data, frame->cr.data, frame->cb.data
}

// This function gets called for each decoded audio frame
void my_audio_callback(plm_t *plm, plm_samples_t *samples, void *user) {
    // Do something with samples->interleaved
}

// Load a .mpg (MPEG Program Stream) file
plm_t *plm = plm_create_with_filename("some-file.mpg");

// Install the video & audio decode callbacks
plm_set_video_decode_callback(plm, my_video_callback, my_data);
plm_set_audio_decode_callback(plm, my_audio_callback, my_data);

// Decode. Typically called in your event loop, once per frame
do {
    plm_decode(plm, time_since_last_call);
} while (!plm_has_ended(plm));

// All done, free memory
plm_destroy(plm);

The full documentation for the library can be found in the pl_mpeg.h.

Producer & Consumer in C

There's one implementation detail in PL_MPEG that I didn't find an elegant solution for: the synchronization of producers and consumers.

In PL_MPEG The demuxer reads a buffer or file and spits out packets of video and audio data. For the MP2 audio decoding we know exactly how many bytes we need to decode one frame (1152 samples) of PCM data. So the audio decoder asks the buffer once before decoding if enough data is available and just bails otherwise:

plm_samples_t *plm_audio_decode(plm_audio_t *self) {
    if (!plm_buffer_has(self->buffer, self->frame_data_size)) {
        return NULL;
    }
    // Continue decoding…
}

plm_buffer_has() will attempt to load more data if needed. Audio frames are small, so demuxing a whole frame at once is not a problem.

The MPEG1 video format makes this more complicated. There's no information in the frame header that tells us how big the video frame is. There's probably a reason why the byte size of a video frame is never stated in the muxed packet header or video frame header, but it's beyond my comprehension.

Ideally we would just demux all video packets needed for a single frame and then run the decoder. After all, memory is cheap and we can easily scale our buffers accordingly. Of course it makes sense that early MPEG decoders with tighter memory constraints didn't do this and instead ran the demuxer and decoder in parallel. However, I wanted to keep PL_MPEG simple and single threaded. So all we can do is to continually ask the buffer if a byte is available (or can be loaded) before reading it. This scatters load calls for the buffer all over the decode code and forces lots of switches between two tangentially related contexts.

This also introduces another problem: if we're in the middle of decoding a video frame and the buffer doesn't have any more bytes yet (e.g. because it's streaming from the net) we would need to pause the decoder, save its exact state and later, when enough bytes are available, resume it again. Of course this isn't particularly difficult to achieve using threads, but if we want to stay single threaded it gets very hairy.

This article by Simon Tatham explains the problem nicely and does provide a solution for synchronizing two simple loops. Sadly, our video decoder could bail anywhere in the call stack, a few functions deep. So what we really need are coroutines, such as those natively provided in Golang. Some coroutine implementations for C exist, but they are quite large and/or require platform specific assembly, making it unsuitable for a small header only lib.

So currently, if you want to feed a plm_buffer() yourself through a plm_buffer_load_callback and you can't guarantee that the data can be loaded in a timely fashion you have two options:

  1. Run PL_MPEG in its own thread and just busy-wait it in your load callback until enough data is available
  2. Search through all available bytes until you find the PICTURE_START_CODE of the next frame – making sure that the previous frame can be completely decoded before calling plm_video_decode()

Of course with the second solution you introduce one full frame of lag for streaming video. If latency is important that leaves only one option.

Interestingly, if I interpret the source correctly, ffmpeg chose the second option (waiting for the next PICTURE_START_CODE) even for the MPEG-TS container format, which is meant for streaming. So demuxing MPEG-TS with ffmpeg always introduces a frame of latency.

This gross oversight in the overengineered (especially for its time) MPEG-PS and MPEG-TS container formats just leaves me dumbfounded. If anybody knows why the MPEG standard doesn't just provide a byte size in the header of each frame or even just a FRAME_END code, or if you have a solution for this problem, let me know!

Rewriting Pagenode

Today I released Pagenode – a project that I started 14 years ago. Pagenode began its life as a full-fledged Content Management System and now, after countless rewrites, became a simple library. Pagenode's journey mimics my own as a developer. Its current iteration expresses my desire for simplicity.

In 2004, after dabbling a bit with PHP and finally grasping MySQL I set out to build my own CMS. I previously looked at a lot of different CMSes on the market and found all of them to be lacking – some in code quality and overal structure (Wordpress comes to mind), others just felt overblown even back then and sported lots of weird "features" (e.g. Typo 3).

I wanted a simple, unified way to manage content. A single tree of nodes of different types, each derived from a base node type. Extensibility was paramount.

It took a few attempts to get it working. The way I structured my code back then looks embarrassingly amateurish to me now, but I learned a lot. I was especially proud of how my CMS stored the content tree in the database by using a Nested Set, complete with support for cut/copy/pasting of subtrees.

Manipulating the Node Tree

Content in my CMS was written in a minimalist markup language. Somewhat similar to the then non-existent Markdown, albeit with a smaller scope and worse ideas.

Authoring Content

I added a bunch of more features that I thought were necessary at the time: a templating language, plugin system, localization support, etc. I also added a full fledged file manager, which was a lot of work on its own, but ultimately turned out useless – I uploaded most things via FTP and there was seldom any need for copying/moving/deleting files.

File Manager

My CMS was almost complete and, as far as I was concerned, better than anything else on the market. I named the thing Pagenode and set out to release it "soon".

That release never came, but Pagenode still served my own needs nicely for more than a decade. The site for my HTML5 game engine (impactjs.com), my earlier gaming related site (chaosquake.de), this blog (phoboslab.org) and some others all ran on Pagenode.

During the years, it became apparent to me that some types of content just don't fit in a tree. My workaround was to create a new node type that hosts and manages the content by itself, instead of directly in the tree. It worked well enough, but eventually I had to re-think the whole approach.

Earlier in 2017 I started to think about re-writing Pagenode. I had learned a lot in the past decade and my disdain for complexity has only grown. There were a lot of things in Pagenode I could now do more elegantly, with less code in a clearer way.

So I re-wrote Pagenode.

I still wanted to use PHP and MySQL, because virtually every host supports it. PHP is still my "get shit done" language and in its not nearly as bad as people remember it.

My focus would be on collections of different types of content. No more tree like structure, but instead tagged nodes and a simple API to retrieve them. This allowed for a lot more flexibility, but still made the whole system a lot simpler.

With this narrow focus on collections of tagged nodes I soon noticed that I don't need a database. I could save nodes as JSON files, efficiently build and cache an index of all of them and filter directly in PHP.

So I re-wrote Pagenode again.

I build a system where you could define your own types by just sub-classing the base node type:

class Article extends Node {
    const FIELDS = [
        'related' => Field_URL::class,
        'title' => Field_Text::class,
        'subline' => Field_Text::class,
        'titleImage' => Field_Image::class,
        'body' => Field_Markdown::class
    ];
}

This is all that was needed. The Admin interface would adapt automatically.

File Manager

I spent a lot of time on the text editor, making drag+drop upload work, implementing a file browser and more. I brought all the content from my blog over into the new Pagenode and it worked great.

My CMS was almost complete and, as far as I was concerned, more elegant than anything else on the market. I named the thing Pagenode and set out to release it "soon".

(This version of Pagenode can now be found on github as pagenode-legacy)

Then I noticed that I actually don't want to write my blog posts in a textarea in a browser, but rather use my favorite editor. I also wanted versioning. And while I still could put everything in a git repository, work on my blog locally and then push to my server, nothing of the administration interface that I built would help me do that.

So I re-wrote Pagenode again.

This time Pagenode is just a library to load, select and filter plaintext content from disk. Nothing more. That's the version of Pagenode that I needed. It's extensible, it doesn't get in your way, it's fast, it's simple and it comes in a single PHP file.

Read more about Pagenode:

pagenode.org – official website and documentation

github.com/phoboslab/pagenode – source code

Underrun – Making Of

I participated in this year's js13kGames, a JavaScript game development competition with a file size limit of 13kb, including code, assets and everything else. My entry was Underrun, a twin stick shooter using WebGL.

Underrun Play Underrun – A WebGL shooter in 13kb of JavaScript

For this competition I set out to produce something with a dense atmosphere – which is inherently difficult to do with so little space. In my experience a dense atmosphere is most easily achieved with two things: moody lighting and rich sound & music. The assets for sound & music usually take up a lot of space and lighting for 2D games requires more and bigger graphic assets. With only 13kb to space I had to look for different solutions for these things.

Graphic Assets

Since I was very limited with the amount of graphic assets I could fit in this game the decision to implement a 3D perspective came naturally: atmospheric lighting is easier to do in 3D than in 2D and requires less assets to look neat. In contrast, to produce interesting light effects in 2D typically requires you to implement a third dimension anyway. This can be done through normal maps as some 2D Pixel Art games do or by explicitly separating your environment into different layers as Teleglitch does for example.

My approach for the graphics was to render a simple 2D tilemap with a bunch of textured quads. All walls are rendered as cubes and entities (the player, enemies, projectiles, particles and health pickups) are just simple sprites. All textures for the game fit in a single tile sheet, 2.12kb in size.

Underrun Assets All graphic assets for the game, carefully tuned to 16 colors

I opted to render the player character as a sprite as well, even though it needs to be freely rotated. It was far easier to produce 8 different sprites – one for each direction – than to implement a loader and renderer for complex 3D models. This allowed me to omit any 3D vector or matrix math operations. The game never needs to rotate any geometry. Everything is rendered with axis aligned quads.

To make the player character rotation look convincing I build a 3D model and simply took screenshots for each of the different perspectives. I'm a total doofus when it comes to 3D modeling software, but that doesn't stop me from using the ingenious Wings3D. The result doesn't look like much, but scaled down to 16px, it doesn't matter.

title The player character model built in Wings3D

The final game features 3 different levels, each 64×64 tiles in size. When I started to work on the game I considered to use Run Length Encoding to compress the level maps. However, even as simple as RLE is, a decompressor for it would have taken up some valuable space. So my solution was to just let the browser handle the decompression by using PNG images.

title Underrun's levels are stored as PNG images

With the PNG compression, each level image is just about 300bytes in size. A naive approach would have needed 64×64 bytes = 4kb per level (assuming storage as raw binary, 1byte per tile).

When loading the level a random number generator (RNG) is used to determine exactly which tile graphic to use for each floor and wall. The RNG also decides where to place enemies and powerups, as these are not encoded in the level images. To make this consistent between playthroughs, so that each player plays exactly the same game, I implemented a tiny seedable RNG (source code) and re-seeded it with the same constant before loading each level.

I also hardcoded some of the wall tiles to be rendered with twice the normal height to make it look more interesting. You can see the complete level loading routines in the load_level() function.

Rendering

The renderer follows an extremely simple design: a single buffer is used to store all vertices, texture coordinates and normals. A second buffer is used to store all light sources (position, color, attenuation). The game can write to these buffers using the push_quad() and push_light() functions. The renderer clears these buffers at the beginning of each frame and draws the (filled) buffers at the end of the frame.

There's one small optimization to this. During level load, since all the level geometry is static, the game sets a reset mark for the geometry buffer after all the level geometry has been pushed. Before each frame the renderer will then reset the write position for the geometry buffer to this reset mark instead of to 0. This clears all the entities, but we don't have to re-push the level data.

Since the game's graphics are quite simple, there's no need for any occlusion culling or any other optimizations to reduce the amount of geometry being drawn. Underrun draws the whole level – all floor an ceiling tiles and all sprites – for every single frame in a single draw call.

The gist of the renderer looks like this:

var 
    num_verts = 0,
    level_num_verts = 0,
    max_verts = 1024 * 64,
    buffer_data = new Float32Array(max_verts * 8); // 8 floats per vert


// Push a new quad into the buffer at the current num_verts position.
function push_quad(x1, y1, z1, ...) {
    buffer_data.set([x1, y1, z1, ...], num_verts * 8);
    num_verts += 6;
}

// Load a level, push some quads.
function load_level(data) {
    num_verts = 0;

    for (var y = 0; y < 64; y++) {
        for (var x = 0; x < 64; x++) {
            // lots of push_quad()...
        }
    }

    level_num_verts = num_verts; // set reset pos to current write pos
}

// Resets the buffer write position to just after the level geometry.
function renderer_prepare_frame() {
    num_verts = level_num_verts;
}

// Hand over the buffer data, up to num_verts, to webgl.
function renderer_end_frame() {
    gl.bufferData(gl.ARRAY_BUFFER, buffer_data, gl.DYNAMIC_DRAW);
    gl.drawArrays(gl.TRIANGLES, 0, num_verts);
};

// Run a frame
function tick() {
    renderer_prepare_frame();

    for (var i = 0; i < entities.length; i++) {
        entities[i].update();
        entities[i].draw();
    }

    renderer_end_frame();

    requestAnimationFrame(tick);
}

For the lighting I opted for some very simple vertex lights. Since the game only renders relatively small quads, it doesn't matter much that the light computation is handled in the vertex shader instead of per-pixel in the fragment shader. This also allowed for many more light sources. The game currently allows for 32 lights, each of which considered for every single vertex. Current GPUs are so stupidly fast that no optimizations are needed for such simple cases.

The light calculation in Underrun's Vertex Shader looks somewhat like this:

// 7 values for each light:  position (x, y, z), color (r, g, b), attenuation
uniform float lights[7 * max_lights];
varying vec3 vertex_light;

void main(void){
    vertex_light = vec3(0.3, 0.3, 0.6); // ambient color

    // apply each light source to this vertex
    for(int i = 0; i < max_lights; i++ ) {
        vec3 light_position = vec3(lights[i*7], lights[i*7+1], lights[i*7+2]);
        vec3 light_color = vec3(lights[i*7+3], lights[i*7+4], lights[i*7+5]);
        vec3 distance = length(light_position - vertex_position);
        float attenuation= (1.0/(lights[i*7+6] * distance);

        vertex_light += 
            light_color
            * max(dot(normal, normalize(dist)), 0.0) // diffuse
            * attenuation;
    }

    // ...
}

For some extra scary atmosphere the fragment shader calculates a simple black fog based on the distance to the camera. An exception is made to the enemies' eyes: all full red texture pixels are always rendered unchanged – no light, no fog. This makes them essentially glow in the dark.

The fragment shader also reduces the final image to 256 colors. I experimented with some color tinting but ultimately settled for a uniform rgb palette. Rendering in 256 colors and a screen size of just 320×180 pixels not only gives the game an old-school vibe, but also hides a lot of the graphical shortcomings.

Have a look at the source code for the renderer.js and game.js - there's really not that much to it.

Music & Sounds

The sound and music for all my previous games made up the bulk of the data size. E.g. Xibalba has about 800kb of graphic assets (tile sheets, sprites, title screen etc.) but more than 4mb of music and 2.5mb of sound effects (actually 8mb and 5mb respectively, since all assets need to be provided as .ogg and .mp3). This was of course infeasible when you only have 13kb total to work with.

So for Underrun all music and sound effects are generated in JavaScript when the game starts. The game uses the brilliant Sonant-X library to render instruments from a bunch of parameters into sound waves.

My dear friend Andreas Lösch of no-fate.net produced the instruments and sequences for the music in the tracker Sonant-X Live. The sound effects were produced in Sonant-X Live as well; each effect being a single instrument.

Sonant-X Underrun's Music, produced in the Sonant-X Live tracker

To save some more space I began to remove all the parts from Sonant-X that were not needed for my game, leaving only the WebAudio generator. In doing so I noticed that the library was unnecessary slow, taking half a second to generate an empty buffer object. The culprit was the use of a single, interleaved Uint8 Buffer for the left and right audio channel, storing unsigned 16bit values with the implicit zero amplitude at 32767.

The library was spending a lot of time loading from and storing 16bit values into this 8bit buffer and later converting it to a signed float suitable for WebAudio. I believe this is an artifact from the original use case of this library: producing a WAV data URI.

I refactored the library to use two Float32Arrays (one for each channel) that can be directly used by the WebAudio API in the browser. This not only simplified the code and reduced the file size by 30% but also made the music generation twice as fast.

As an example, have a look at the original applyDelay() function and my modified one - granted, I also removed the pausing/re-scheduling functions here that originally prevented blocking the event loop with a long running computation, but it wasn't really needed anymore.

All in all, minified and gzipped, the music and sound for Underrun along with the Sonant-X library now occupy only about 2kb. That's a pretty big deal, considering all my previous games' biggest asset were sound and music. So even if your game is not size restricted as much, it may make sense to generate audio directly in your game, instead of adding big files to the download.

Minification

When I started this project I took great care to write as little code as possible and to make sure all my code could be minified effectively by UglifyJS. The source has quite an unusual style compared to my other JS projects. It's using a lot of global variables and functions, instead of abstracting it more neatly into classes. I also wrote the code in the more C-like snake_case style to force my brain into this different mode.

In hindsight, this was totally unnecessarily. Minifying and zipping would have gotten rid of most of the overhead of using more abstractions.

One trick that made a small difference is the "inlining" of all WebGL constants at build-time. E.g. gl.ONE_MINUS_SRC_ALPHA is replaced with just 771 - the constant's value. I also shortened all the WebGL function calls by producing aliases that just contained the first two letters and all subsequent upper-case letters of the function's original name:

for (var name in gl) {
    if (gl[name].length !== undefined) { // is function?
        gl[name.match(/(^..|[A-Z]|\d.|v$)/g).join('')] = gl[name];
    }
}

This allowed me to use gl.getUniformLocation() as just gl.geUL(). Note though that this approach is quite hack-ish and produces some name collisions. But for my game it worked out nicely.

I also took a lot of shortcuts with the game logic and especially collision detection. Collision detection between entities is just using two nested loops, checking each entity against all other entities. This quadratic function quickly explodes with a lot of entities in the level (e.g. 100 entities = 10.000 comparisons) - it's just amazing with what you can get away with these days.

Other than that I didn't do anything particularly clever or exciting to further minify the game.

The full source for Underrun is on github: github.com/phoboslab/underrun

Be sure to have a look at some of the other js13Games entries this year. Some of my favorites are The Chroma Incident, 1024 Moves, ISP, Wander and Envisonator.

Impact Is Now Free & Open Source

My HTML5 Game Engine Impact launched almost 8 years ago. The last update was published in 2014. While Impact still works nicely in modern browsers, it lacks support for better graphic and sound APIs that are now available. I felt increasingly bad for selling a product that is hardly maintained or improved.

So as of today Impact will be available completely for free, published under the permissive MIT License.

Impact's source is available on github.com/phoboslab/impact

Thanks anyone who bought a license in the last few years!

Decode It Like It's 1999

A few years ago I started to work on an MPEG1 Video decoder, completely written in JavaScript. Now, I finally found the time to clean up the library, improve its performance, make it more error resilient and modular and add an MP2 Audio decoder and MPEG-TS demuxer. This makes this library not just an MPEG decoder, but a full video player.

In this blog post I want to talk a bit about the challenges and various interesting bits I discovered during the development of this library. You'll find a demo, the source and documentation and reasons why to use JSMpeg over on the official website:

jsmpeg.com - Decode it like it's 1999

Refactoring

Recently, I needed to implement audio streaming into JSMpeg for a client and only then realized in what a pitty state the library is. It has grown quite a bit since its first release. A WebGL renderer, WebSocket client, progressive loading, benchmarking facilities and much more have been tacked on in the last few years. All kept in a single, monolithic class with conditionals bursting at the seams.

I decided to clean up this mess first by separating its logical components. I also sketched out what would be needed for the sound implementation: a Demuxer, MP2 decoder and Audio Output:

Plus some auxiliary classes:

Each of the components (apart from the Sources) has a .write(buffer) method to feed it with data. These components can then "connect" to a destination that receives the processed result. The complete flow through the library looks like this:

                 / -> MPEG1 Video Decoder -> Renderer
Source -> Demuxer  
                 \ -> MP2 Audio Decoder -> Audio Output

JSMpeg currently has 3 different implementations for the Source (AJAX, AJAX progressive and WebSocket) and there's 2 different Renderers (Canvas2D and WebGL). The rest of the library is agnostic to these – i.e. the Video Decoder doesn't care about the Renderers internals. With this approach it's easy to add new components: further Sources, Demuxers, Decoders or Outputs.

I'm not completely happy with how these connections work in the library. Each component can only have one destination (apart from the Demuxer, that has one destination per stream). It's a tradeoff. In the end, I felt that anything else would be over engineering and complicating the library for no good reason.

WebGL Rendering

One of the most computationally intensive tasks for an MPEG1 decoder is the color conversion from MPEG's internal YUV format (Y'Cr'Cb to be precise) into RGBA so that the browser can display it. Somewhat simplified, the conversion looks like this:

for (var i = 0; i < pixels.length; i+=4 ) {
    var y, cb, cr = /* fetch this from the YUV buffers */;

    pixels[i + 0 /* R */] = y + (cb + ((cb * 103) >> 8)) - 179;
    pixels[i + 1 /* G */] = y - ((cr * 88) >> 8) - 44 + ((cb * 183) >> 8) - 91;
    pixels[i + 2 /* B */] = y + (cr + ((cr * 198) >> 8)) - 227;
    pixels[i + 4 /* A */] = 255;
}

For a single 1280x720 video frame that loop has to run 921600 times to convert all pixels from YUV to RGBA. Each pixel needs 3 writes to the destination RGB array (we can pre-populate the alpha component since it's always 255). That's 2.7 million writes per frame, each needing 5-8 adds, subtracts, multiplies and bit shifts. For a 60fps video, we end up with more than 1 billion operations per second. Plus the overhead for JavaScript. The fact that JavaScript can do this, that a computer can do this, still boggles my mind.

With WebGL, this color conversion (and subsequent displaying on the screen) can be sped up tremendously. A few operations for each pixel is the bread and butter of GPUs. GPUs can process many pixels in parallel, because they're independent of any other pixel. The WebGL shader that's run on the GPU doesn't even need these pesky bit shifts – GPUs likes floating point numbers:

void main() {
    float y = texture2D(textureY, texCoord).r;
    float cb = texture2D(textureCb, texCoord).r - 0.5;
    float cr = texture2D(textureCr, texCoord).r - 0.5;

    gl_FragColor = vec4(
        y + 1.4 * cb,
        y + -0.343 * cr - 0.711 * cb,
        y + 1.765 * cr,
        1.0
    );
}

With WebGL, the time needed for the color conversion dropped from 50% of the total JS time to just about 1% for the YUV texture upload.

There was one minor issue I stumbled over with the WebGL renderer. JSMpeg's video decoder does not produce three Uint8Arrays for each color plane, but Uint8ClampedArrays. It's doing this, because the MPEG1 standard mandates that decoded color values must be clamped, not wrap around. Letting the browser do the clamping through the ClampedArray works out faster than doing it in JavaScript.

A bug that still stands in some Browsers (Chrome and Safari) prevents WebGL from using the Uint8ClampedArray directly. Instead, for these browsers we have to create a Uint8Array view for each array for each frame. This operation is pretty fast since nothing needs to be copied, but I'd still like to do without it.

JSMpeg detects this bug and only uses the workaround if needed. We simply try to upload a clamped array and catch the error. This detection sadly triggers an un-silencable warning in the console, but it's better than nothing.

WebGLRenderer.prototype.allowsClampedTextureData = function() {
    var gl = this.gl;
    var texture = gl.createTexture();

    gl.bindTexture(gl.TEXTURE_2D, texture);
    gl.texImage2D(
        gl.TEXTURE_2D, 0, gl.LUMINANCE, 1, 1, 0,
        gl.LUMINANCE, gl.UNSIGNED_BYTE, new Uint8ClampedArray([0])
    );
    return (gl.getError() === 0);
};

WebAudio for Live Streaming

For the longest time I assumed that in order to feed WebAudio with raw PCM sample data without much latency or pops and cracks, you'd have to use a ScriptProcessorNode. You'd copy your decoded sample data just in time whenever you get the callback from the script processor. It works. I tried it. It needs quite a bit of code to function properly and of course it's computationally intensive and inelegant.

Luckily, my initial assumption was wrong.

The WebAudio Context maintains its own timer that's separate from JavaScript's Date.now() or performance.now(). Further, you can instruct your WebAudio sources to start() at a precise time in the future based on the context's time. With this, you can string very short PCM buffers together without any artefacts.

You only have to calculate the start time for the next buffer by continuously adding the duration of all previous ones. It's important to always use the WebAudio Context's own time for this.

var currentStartTime = 0;

function playBuffer(buffer) {
    var source = context.createBufferSource();
    /* load buffer, set destination etc. */

    var now = context.currentTime;
    if (currentStartTime < now) {
        currentStartTime = now;
    }

    source.start(currentStartTime);
    currentStartTime += buffer.duration;
}

There's a caveat though: I needed to get the precise remaining duration of the enqueued audio. I implemented it simply as the difference between the current time and the next start time:

// Don't do that!
var enqueuedTime = (currentStartTime - context.currentTime);

It took me a while to figure it out, but this doesn't work. You see, the context's currentTime is only updated every so often. It's not a precise real time value.

var t1 = context.currentTime;
doSomethingForAWhile();
var t2 = context.currentTime;

t1 === t2; // true

So, if you need the precise audio play position (or anything based on it), you have to revert to JavaScript's performance.now().

Audio Unlocking on iOS

You gotta love the shit that Apple throws into Web devs faces from time to time. One of those things is the need to unlock audio on a page before you can play anything. Basically, audio playback can only be started as a response to a user action. You click on a button, audio plays.

This makes sense. I won't argue against it. You don't want to have audio blaring at you unannounced when you visit a page.

What makes it shitty, is that Apple neither provided a way to cleanly unlock Audio nor a way to ask the WebAudio Context if it's unlocked already. What you do instead, is to play an Audio source and continually check if it's progressing. You can't chek immediately after playing, though. No, no. You have to wait a bit!

WebAudioOut.prototype.unlock = function(callback) {
    // This needs to be called in an onclick or ontouchstart handler!
    this.unlockCallback = callback;

    // Create empty buffer and play it
    var buffer = this.context.createBuffer(1, 1, 22050);
    var source = this.context.createBufferSource();
    source.buffer = buffer;
    source.connect(this.destination);
    source.start(0);

    setTimeout(this.checkIfUnlocked.bind(this, source, 0), 0);
};

WebAudioOut.prototype.checkIfUnlocked = function(source, attempt) {
    if (
        source.playbackState === source.PLAYING_STATE || 
        source.playbackState === source.FINISHED_STATE
    ) {
        this.unlocked = true;
        this.unlockCallback();
    }
    else if (attempt < 10) {
        // Jeez, what a shit show. Thanks iOS!
        setTimeout(this.checkIfUnlocked.bind(this, source, attempt+1), 100);
    }
};

Progressive Loading via AJAX

Say you have a 50mb video file that you load via AJAX. The video starts loading no problem. You can even check the current process (downloaded vs. total bytes) and display a nice loading animation. What you can not do, is to access the already downloaded data while the rest of the file is still loading.

There have been some proposals for adding chunked ArrayBuffers into XMLHttpRequest, but nothing has been implemented across browsers. The newer fetch API (that I still don't understand the purpose of) proposed some similar features, but again: no cross browser support. However, we can still do the chunked downloading in JavaScript using Range-Requests.

The HTTP standard implements a Range header that allows you to only grab part of a resource. If you just need the first 1024 bytes of a big file, you set the header Range: bytes=0-1024 in your request. Before we can start though, we have to figure out how large the file. We can do this with a HEAD request, instead of a GET. This returns only the HTTP headers for the resource, but none of the body bytes. Range-Requests are supported by almost all HTTP servers. The one exception I know of, is PHP's built-in development server.

JSMpeg's default chunk size for downloading via AJAX is 1mb. JSMpeg also appends a custom GET parameter to the URL (e.g. video.ts?0-1024) for each request, so that each chunk essentially gets its own URL and plays nice with bad caching proxies.

With this in place, you can start playing the file as soon as the first chunk has arrived. Also, further chunks will only be downloaded when they're needed. If someone only watches the first few seconds of a video, only those first few seconds will get downloaded. JSMpeg does this by measuring the time it took to load a chunk, adding a lot of safety margin and comparing this to the remaining duration of the already loaded chunks.

In JSMpeg, the Demuxer splits streams as fast as it can. It also decodes the presentation time stamp (PTS) for each packet. The video and audio decoders however only advance their play position in real-time increments. The difference between the last demuxed PTS and the decoder's current PTS is the remaining play time for the downloaded chunks. The Player periodically call's the Source's resume() method with this headroom time:

// It's a silly estimate, but it works
var worstCaseLoadingTime = lastChunkLoadingTime * 8 + 2;
if (worstCaseLoadingTime > secondsHeadroom) {
    loadNextChunk();
}

Audio & Video Sync

JSMpeg tries to play audio as smoothly as possible. It doesn't introduce any gaps or compressions when queuing up samples. Video playback orients itself on the audio playback position. It's done this way, because even the tiniest gaps or discontinuities are far more perceptible in audio than in video. It's far less jarring if a video frame is a few milliseconds late or dropped.

For the most part, JSMpeg relies on the presentation time stamp (PTS) of the MPEG-TS container for playback, instead of calculating the playback time itself. This means, the PTS in the MPEG-TS file have to be consistent and accurate. From what I gathered from the internet, this is not always the case. But modern encoders seemed to have figured this out.

One complication was that the PTS doesn't always start at 0. For instance, if you have a WebCam connected and running for a while, the PTS may be the start time when the WebCam was turned on, not when recording started. Therefore, JSMPeg searches for the first PTS it can find and uses that as the global start time for all streams.

The MPEG1 and MP2 decoders also keep track of all PTS they received alongside with the buffer position of each PTS. With this, we can seek through the audio and video streams to a specific time.

Currently, JSMpeg will happily seek to an inter-frame and decode it on top of the previously decoded frame. The correct way to handle this, would be to rewind to the last intra-frame before the one we seek to and decode all frames in between. This is something I still need to fix.

Build Tools & The JavaScript Ecosystem

I avoid build tools wherever I can. Chances are, your shiny toolset that automates everything for you, will stop working in a year or two. Setting up your build environment is never as easy as "just call webpack", or use grunt or whatever task runner is the hot shit today. It always ends up like

(...) Where do I get webpack from? Oh, I need npm.
Where do I get npm from? Oh, I need nodejs.
Where do I get nodejs from? Oh, I need, homebrew.
What's that? gyp build error? Oh, sure, I need to install XCode.
Oh, webpack needs the babel plugin?
What? The left-pad dependency could not be resolved?
...

And suddenly you spent two hours of your life and downloaded several GB of tools. All to build a 20kb library, for a language that doesn't even need compiling. How do I build this library 2 years from now? 5 years?

I had a thorough look at webpack and hated it. It's way too complex for my taste. I like to understand what's going on. That's part of the reason I wrote this library instead of diving into WebRTC.

So, the build step for JSMpeg is a shell script with a single call to uglifyjs that can be altered to use cat (or copy on Windows) in 2 seconds. Or you simply load the source files separately in your HTML while you're working on it. Done.

Quality, Bitrates And The Future

The quality of MPEG1 at reasonable bitrates is, much to my surprise, not bad at all. Have a look at the demo video on jsmpeg.com - granted, it's a favorable case for compression. Slow movement and not too many cuts. Still, this video weighs in at 50mb for it's 4 minutes, and provides a quality comparable to most Youtube videos that are "only" 30% smaller.

In my tests, I could always get video that I'd consider "high quality" at max 2Mbit/s. Depending on your use-case (want a coffee cam?), you can go to 100Kbit/s or even lower. There's no bottom limit for the bitrate/framerate.

You could get a cheap cell phone contract with a 1GB/month data limit, put a 3G dongle and a webcam on a Raspberry Pi, attach it to a 12 V automotive battery, throw it on your crops field and get a live weather cam that doesn't need any infrastructure or maintenance for a few years and is viewable in your smartphone's browser without installing anything.

The simplicity of MPEG1, compared to modern codecs, makes it very attractive in my opinion. It's well understood and there's a ton of tools that can work with it. All patents relating to MPEG1/MP2 have expired now. It's a free format.

Do you remember the GIF revival after its patents expired?

Archive