MPEG1 Single file C library

If you want to play videos in a native app or game you basically have three options:

Rely on the API that the platform provides (e.g. AVPlayer on macOS & iOS)
Include a giant, multi-megabyte library like ffmpeg or all of libwebm, libvpx, libogg, and libvorbis
License Bink Video from RAD Game Tools – which is what most game studios seem to do these days

A few weeks ago I had this conversation on Twitter, where I offered a possible fourth solution:

Tweets

MPEG1 is a very old format dating back to 1993, but the quality and compression ratio still holds up surprisingly well. Especially for games I would imagine it is still good enough for displaying some company logos or in game "video calls" from your AI team mates.

Because MPEG1 is such an old format, decoding it costs very little CPU time compared to more modern video formats. This is important for games running within a tight time budget.

Since I already had some experience with implementing an MPEG1 decoder in JavaScript and later porting parts of it to WASM (via C), a standalone plain C library would be a no brainer. So I proposed to build one, modeled after the excellent stb single-file libraries – dependency free libraries that come in a single C header file, easily embeddable in your projects.

Now that Sean Barrett, the original author of the stb libs (and coincidentally employed at RAD Game Tools, the makers of Bink Video) and John Carmack gave their thumbs up, I had little excuse not to do it.

It took me way longer than I expected, but I managed to pack a fully working MPEG1 Video Decoder, MP2 Audio Decoder and MPEG Program Stream Demuxer, all wrapped up in an easy to use API into roughly 3000 lines of code in a single C header.

PL_MPEG player demo

A video player example utilizing SDL and OpenGL is included in the repository. OpenGL is primarily used for the color space conversion from YUV to RGB. This is done in a fairly simple shader which cuts the decode time in half.

Of course you can also do the color space conversion in C and get a raw RGB buffer, as demonstrated in the even simpler extract frames example.

Implementation Details

Debugging a video decoder is fun. You get to see exactly where you screw up.

Corrupted video from an off-by-one error in the demuxer

Debugging an audio decoder will make your ears explode. I'll spare you an example.

I usually don't write much C, but I enjoy working within the limits of the language quite a lot. There's one particular design pattern in C that I always come back to:

typedef struct obj_t obj_t;

obj_t *obj_create();
obj_method_a(obj_t *self);
obj_method_b(obj_t *self);
obj_destroy(obj_t *self);

The obj_t struct is opaque. Its members are only defined in the implementation of the library, but invisible to the library's users. Every type get its own _create() and _destroy() function. Using this pattern throughout the library makes reasoning about it very straight forward. It's as simple as it gets, leaves no questions about memory management and neatly hides the implementation details.

In PL_MPEG some of these object _create() functions take byte arrays or other objects as a parameter. I wanted to make it very clear who has ownership of these parameters after the call. There's good reasons for either option: a) you keep ownership of what you pass in and free the memory yourself, or b) you hand over ownership and let the library clean it up when it's no longer needed. So these functions in PL_MPEG spell it out with an extra free_when_done flag:

plm_t *plm_create_with_buffer(plm_buffer_t *buffer, int destroy_when_done);

PL_MPEG exposes some lower level interfaces to its buffer, video decoder, audio decoder and demuxer functionality, but also provides an easy to use wrapper that combines all those. With this wrapper, loading and decoding video and audio is as simple as this:

#define PL_MPEG_IMPLEMENTATION
#include "plmpeg.h"

// This function gets called for each decoded video frame
void my_video_callback(plm_t *plm, plm_frame_t *frame, void *user) {
    // Do something with frame->y.data, frame->cr.data, frame->cb.data
}

// This function gets called for each decoded audio frame
void my_audio_callback(plm_t *plm, plm_samples_t *samples, void *user) {
    // Do something with samples->interleaved
}

// Load a .mpg (MPEG Program Stream) file
plm_t *plm = plm_create_with_filename("some-file.mpg");

// Install the video & audio decode callbacks
plm_set_video_decode_callback(plm, my_video_callback, my_data);
plm_set_audio_decode_callback(plm, my_audio_callback, my_data);

// Decode. Typically called in your event loop, once per frame
do {
    plm_decode(plm, time_since_last_call);
} while (!plm_has_ended(plm));

// All done, free memory
plm_destroy(plm);

The full documentation for the library can be found in the pl_mpeg.h.

Producer & Consumer in C

There's one implementation detail in PL_MPEG that I didn't find an elegant solution for: the synchronization of producers and consumers.

In PL_MPEG The demuxer reads a buffer or file and spits out packets of video and audio data. For the MP2 audio decoding we know exactly how many bytes we need to decode one frame (1152 samples) of PCM data. So the audio decoder asks the buffer once before decoding if enough data is available and just bails otherwise:

plm_samples_t *plm_audio_decode(plm_audio_t *self) {
    if (!plm_buffer_has(self->buffer, self->frame_data_size)) {
        return NULL;
    }
    // Continue decoding…
}

plm_buffer_has() will attempt to load more data if needed. Audio frames are small, so demuxing a whole frame at once is not a problem.

The MPEG1 video format makes this more complicated. There's no information in the frame header that tells us how big the video frame is. There's probably a reason why the byte size of a video frame is never stated in the muxed packet header or video frame header, but it's beyond my comprehension.

Ideally we would just demux all video packets needed for a single frame and then run the decoder. After all, memory is cheap and we can easily scale our buffers accordingly. Of course it makes sense that early MPEG decoders with tighter memory constraints didn't do this and instead ran the demuxer and decoder in parallel. However, I wanted to keep PL_MPEG simple and single threaded. So all we can do is to continually ask the buffer if a byte is available (or can be loaded) before reading it. This scatters load calls for the buffer all over the decode code and forces lots of switches between two tangentially related contexts.

This also introduces another problem: if we're in the middle of decoding a video frame and the buffer doesn't have any more bytes yet (e.g. because it's streaming from the net) we would need to pause the decoder, save its exact state and later, when enough bytes are available, resume it again. Of course this isn't particularly difficult to achieve using threads, but if we want to stay single threaded it gets very hairy.

This article by Simon Tatham explains the problem nicely and does provide a solution for synchronizing two simple loops. Sadly, our video decoder could bail anywhere in the call stack, a few functions deep. So what we really need are coroutines, such as those natively provided in Golang. Some coroutine implementations for C exist, but they are quite large and/or require platform specific assembly, making it unsuitable for a small header only lib.

So currently, if you want to feed a plm_buffer() yourself through a plm_buffer_load_callback and you can't guarantee that the data can be loaded in a timely fashion you have two options:

Run PL_MPEG in its own thread and just busy-wait it in your load callback until enough data is available
Search through all available bytes until you find the PICTURE_START_CODE of the next frame – making sure that the previous frame can be completely decoded before calling plm_video_decode()

Of course with the second solution you introduce one full frame of lag for streaming video. If latency is important that leaves only one option.

Interestingly, if I interpret the source correctly, ffmpeg chose the second option (waiting for the next PICTURE_START_CODE) even for the MPEG-TS container format, which is meant for streaming. So demuxing MPEG-TS with ffmpeg always introduces a frame of latency.

This gross oversight in the overengineered (especially for its time) MPEG-PS and MPEG-TS container formats just leaves me dumbfounded. If anybody knows why the MPEG standard doesn't just provide a byte size in the header of each frame or even just a FRAME_END code, or if you have a solution for this problem, let me know!