PHOBOSLAB

Blog Home

Decode It Like It's 1999

A few years ago I started to work on an MPEG1 Video decoder, completely written in JavaScript. Now, I finally found the time to clean up the library, improve its performance, make it more error resilient and modular and add an MP2 Audio decoder and MPEG-TS demuxer. This makes this library not just an MPEG decoder, but a full video player.

In this blog post I want to talk a bit about the challenges and various interesting bits I discovered during the development of this library. You'll find a demo, the source and documentation and reasons why to use JSMpeg over on the official website:

jsmpeg.com - Decode it like it's 1999

Refactoring

Recently, I needed to implement audio streaming into JSMpeg for a client and only then realized in what a pitty state the library is. It has grown quite a bit since its first release. A WebGL renderer, WebSocket client, progressive loading, benchmarking facilities and much more have been tacked on in the last few years. All kept in a single, monolithic class with conditionals bursting at the seams.

I decided to clean up this mess first by separating its logical components. I also sketched out what would be needed for the sound implementation: a Demuxer, MP2 decoder and Audio Output:

Plus some auxiliary classes:

Each of the components (apart from the Sources) has a .write(buffer) method to feed it with data. These components can then "connect" to a destination that receives the processed result. The complete flow through the library looks like this:

                 / -> MPEG1 Video Decoder -> Renderer
Source -> Demuxer  
                 \ -> MP2 Audio Decoder -> Audio Output

JSMpeg currently has 3 different implementations for the Source (AJAX, AJAX progressive and WebSocket) and there's 2 different Renderers (Canvas2D and WebGL). The rest of the library is agnostic to these – i.e. the Video Decoder doesn't care about the Renderers internals. With this approach it's easy to add new components: further Sources, Demuxers, Decoders or Outputs.

I'm not completely happy with how these connections work in the library. Each component can only have one destination (apart from the Demuxer, that has one destination per stream). It's a tradeoff. In the end, I felt that anything else would be over engineering and complicating the library for no good reason.

WebGL Rendering

One of the most computationally intensive tasks for an MPEG1 decoder is the color conversion from MPEG's internal YUV format (Y'Cr'Cb to be precise) into RGBA so that the browser can display it. Somewhat simplified, the conversion looks like this:

for (var i = 0; i < pixels.length; i+=4 ) {
    var y, cb, cr = /* fetch this from the YUV buffers */;

    pixels[i + 0 /* R */] = y + (cb + ((cb * 103) >> 8)) - 179;
    pixels[i + 1 /* G */] = y - ((cr * 88) >> 8) - 44 + ((cb * 183) >> 8) - 91;
    pixels[i + 2 /* B */] = y + (cr + ((cr * 198) >> 8)) - 227;
    pixels[i + 4 /* A */] = 255;
}

For a single 1280x720 video frame that loop has to run 921600 times to convert all pixels from YUV to RGBA. Each pixel needs 3 writes to the destination RGB array (we can pre-populate the alpha component since it's always 255). That's 2.7 million writes per frame, each needing 5-8 adds, subtracts, multiplies and bit shifts. For a 60fps video, we end up with more than 1 billion operations per second. Plus the overhead for JavaScript. The fact that JavaScript can do this, that a computer can do this, still boggles my mind.

With WebGL, this color conversion (and subsequent displaying on the screen) can be sped up tremendously. A few operations for each pixel is the bread and butter of GPUs. GPUs can process many pixels in parallel, because they're independent of any other pixel. The WebGL shader that's run on the GPU doesn't even need these pesky bit shifts – GPUs likes floating point numbers:

void main() {
    float y = texture2D(textureY, texCoord).r;
    float cb = texture2D(textureCb, texCoord).r - 0.5;
    float cr = texture2D(textureCr, texCoord).r - 0.5;

    gl_FragColor = vec4(
        y + 1.4 * cb,
        y + -0.343 * cr - 0.711 * cb,
        y + 1.765 * cr,
        1.0
    );
}

With WebGL, the time needed for the color conversion dropped from 50% of the total JS time to just about 1% for the YUV texture upload.

There was one minor issue I stumbled over with the WebGL renderer. JSMpeg's video decoder does not produce three Uint8Arrays for each color plane, but Uint8ClampedArrays. It's doing this, because the MPEG1 standard mandates that decoded color values must be clamped, not wrap around. Letting the browser do the clamping through the ClampedArray works out faster than doing it in JavaScript.

A bug that still stands in some Browsers (Chrome and Safari) prevents WebGL from using the Uint8ClampedArray directly. Instead, for these browsers we have to create a Uint8Array view for each array for each frame. This operation is pretty fast since nothing needs to be copied, but I'd still like to do without it.

JSMpeg detects this bug and only uses the workaround if needed. We simply try to upload a clamped array and catch the error. This detection sadly triggers an un-silencable warning in the console, but it's better than nothing.

WebGLRenderer.prototype.allowsClampedTextureData = function() {
    var gl = this.gl;
    var texture = gl.createTexture();

    gl.bindTexture(gl.TEXTURE_2D, texture);
    gl.texImage2D(
        gl.TEXTURE_2D, 0, gl.LUMINANCE, 1, 1, 0,
        gl.LUMINANCE, gl.UNSIGNED_BYTE, new Uint8ClampedArray([0])
    );
    return (gl.getError() === 0);
};

WebAudio for Live Streaming

For the longest time I assumed that in order to feed WebAudio with raw PCM sample data without much latency or pops and cracks, you'd have to use a ScriptProcessorNode. You'd copy your decoded sample data just in time whenever you get the callback from the script processor. It works. I tried it. It needs quite a bit of code to function properly and of course it's computationally intensive and inelegant.

Luckily, my initial assumption was wrong.

The WebAudio Context maintains its own timer that's separate from JavaScript's Date.now() or performance.now(). Further, you can instruct your WebAudio sources to start() at a precise time in the future based on the context's time. With this, you can string very short PCM buffers together without any artefacts.

You only have to calculate the start time for the next buffer by continuously adding the duration of all previous ones. It's important to always use the WebAudio Context's own time for this.

var currentStartTime = 0;

function playBuffer(buffer) {
    var source = context.createBufferSource();
    /* load buffer, set destination etc. */

    var now = context.currentTime;
    if (currentStartTime < now) {
        currentStartTime = now;
    }

    source.start(currentStartTime);
    currentStartTime += buffer.duration;
}

There's a caveat though: I needed to get the precise remaining duration of the enqueued audio. I implemented it simply as the difference between the current time and the next start time:

// Don't do that!
var enqueuedTime = (currentStartTime - context.currentTime);

It took me a while to figure it out, but this doesn't work. You see, the context's currentTime is only updated every so often. It's not a precise real time value.

var t1 = context.currentTime;
doSomethingForAWhile();
var t2 = context.currentTime;

t1 === t2; // true

So, if you need the precise audio play position (or anything based on it), you have to revert to JavaScript's performance.now().

Audio Unlocking on iOS

You gotta love the shit that Apple throws into Web devs faces from time to time. One of those things is the need to unlock audio on a page before you can play anything. Basically, audio playback can only be started as a response to a user action. You click on a button, audio plays.

This makes sense. I won't argue against it. You don't want to have audio blaring at you unannounced when you visit a page.

What makes it shitty, is that Apple neither provided a way to cleanly unlock Audio nor a way to ask the WebAudio Context if it's unlocked already. What you do instead, is to play an Audio source and continually check if it's progressing. You can't chek immediately after playing, though. No, no. You have to wait a bit!

WebAudioOut.prototype.unlock = function(callback) {
    // This needs to be called in an onclick or ontouchstart handler!
    this.unlockCallback = callback;
    
    // Create empty buffer and play it
    var buffer = this.context.createBuffer(1, 1, 22050);
    var source = this.context.createBufferSource();
    source.buffer = buffer;
    source.connect(this.destination);
    source.start(0);

    setTimeout(this.checkIfUnlocked.bind(this, source, 0), 0);
};

WebAudioOut.prototype.checkIfUnlocked = function(source, attempt) {
    if (
        source.playbackState === source.PLAYING_STATE || 
        source.playbackState === source.FINISHED_STATE
    ) {
        this.unlocked = true;
        this.unlockCallback();
    }
    else if (attempt < 10) {
        // Jeez, what a shit show. Thanks iOS!
        setTimeout(this.checkIfUnlocked.bind(this, source, attempt+1), 100);
    }
};

Progressive Loading via AJAX

Say you have a 50mb video file that you load via AJAX. The video starts loading no problem. You can even check the current process (downloaded vs. total bytes) and display a nice loading animation. What you can not do, is to access the already downloaded data while the rest of the file is still loading.

There have been some proposals for adding chunked ArrayBuffers into XMLHttpRequest, but nothing has been implemented across browsers. The newer fetch API (that I still don't understand the purpose of) proposed some similar features, but again: no cross browser support. However, we can still do the chunked downloading in JavaScript using Range-Requests.

The HTTP standard implements a Range header that allows you to only grab part of a resource. If you just need the first 1024 bytes of a big file, you set the header Range: bytes=0-1024 in your request. Before we can start though, we have to figure out how large the file. We can do this with a HEAD request, instead of a GET. This returns only the HTTP headers for the resource, but none of the body bytes. Range-Requests are supported by almost all HTTP servers. The one exception I know of, is PHP's built-in development server.

JSMpeg's default chunk size for downloading via AJAX is 1mb. JSMpeg also appends a custom GET parameter to the URL (e.g. video.ts?0-1024) for each request, so that each chunk essentially gets its own URL and plays nice with bad caching proxies.

With this in place, you can start playing the file as soon as the first chunk has arrived. Also, further chunks will only be downloaded when they're needed. If someone only watches the first few seconds of a video, only those first few seconds will get downloaded. JSMpeg does this by measuring the time it took to load a chunk, adding a lot of safety margin and comparing this to the remaining duration of the already loaded chunks.

In JSMpeg, the Demuxer splits streams as fast as it can. It also decodes the presentation time stamp (PTS) for each packet. The video and audio decoders however only advance their play position in real-time increments. The difference between the last demuxed PTS and the decoder's current PTS is the remaining play time for the downloaded chunks. The Player periodically call's the Source's resume() method with this headroom time:

// It's a silly estimate, but it works
var worstCaseLoadingTime = lastChunkLoadingTime * 8 + 2;
if (worstCaseLoadingTime > secondsHeadroom) {
    loadNextChunk();
}

Audio & Video Sync

JSMpeg tries to play audio as smoothly as possible. It doesn't introduce any gaps or compressions when queuing up samples. Video playback orients itself on the audio playback position. It's done this way, because even the tiniest gaps or discontinuities are far more perceptible in audio than in video. It's far less jarring if a video frame is a few milliseconds late or dropped.

For the most part, JSMpeg relies on the presentation time stamp (PTS) of the MPEG-TS container for playback, instead of calculating the playback time itself. This means, the PTS in the MPEG-TS file have to be consistent and accurate. From what I gathered from the internet, this is not always the case. But modern encoders seemed to have figured this out.

One complication was that the PTS doesn't always start at 0. For instance, if you have a WebCam connected and running for a while, the PTS may be the start time when the WebCam was turned on, not when recording started. Therefore, JSMPeg searches for the first PTS it can find and uses that as the global start time for all streams.

The MPEG1 and MP2 decoders also keep track of all PTS they received alongside with the buffer position of each PTS. With this, we can seek through the audio and video streams to a specific time.

Currently, JSMpeg will happily seek to an inter-frame and decode it on top of the previously decoded frame. The correct way to handle this, would be to rewind to the last intra-frame before the one we seek to and decode all frames in between. This is something I still need to fix.

Build Tools & The JavaScript Ecosystem

I avoid build tools wherever I can. Chances are, your shiny toolset that automates everything for you, will stop working in a year or two. Setting up your build environment is never as easy as "just call webpack", or use grunt or whatever task runner is the hot shit today. It always ends up like

(...) Where do I get webpack from? Oh, I need npm.
Where do I get npm from? Oh, I need nodejs.
Where do I get nodejs from? Oh, I need, homebrew.
What's that? gyp build error? Oh, sure, I need to install XCode.
Oh, webpack needs the babel plugin?
What? The left-pad dependency could not be resolved?
...

And suddenly you spent two hours of your life and downloaded several GB of tools. All to build a 20kb library, for a language that doesn't even need compiling. How do I build this library 2 years from now? 5 years?

I had a thorough look at webpack and hated it. It's way too complex for my taste. I like to understand what's going on. That's part of the reason I wrote this library instead of diving into WebRTC.

So, the build step for JSMpeg is a shell script with a single call to uglifyjs that can be altered to use cat (or copy on Windows) in 2 seconds. Or you simply load the source files separately in your HTML while you're working on it. Done.

Quality, Bitrates And The Future

The quality of MPEG1 at reasonable bitrates is, much to my surprise, not bad at all. Have a look at the demo video on jsmpeg.com - granted, it's a favorable case for compression. Slow movement and not too many cuts. Still, this video weighs in at 50mb for it's 4 minutes, and provides a quality comparable to most Youtube videos that are "only" 30% smaller.

In my tests, I could always get video that I'd consider "high quality" at max 2Mbit/s. Depending on your use-case (want a coffe cam?), you can go to 100Kbit/s or even lower. There's no bottom limit for the bitrate/framerate.

You could get a cheap cell phone contract with a 1GB/month data limit, put a 3G dongle and a webcam on a Raspberry Pi, attach it to a 12 V automotive battery, throw it on your crops field and get a live weather cam that doesn't need any infrastructure or maintenance for a few years and is viewable in your smartphone's browser without installing anything.

The simplicity of MPEG1, compared to modern codecs, makes it very attractive in my opinion. It's well understood and there's a ton of tools that can work with it. All patents relating to MPEG1/MP2 have expired now. It's a free format.

Do you remember the GIF revival after its patents expired?

Thursday, February 2nd 2017
— Dominic Szablewski, @phoboslab

12 Comments:

#1 – Yevgeniy – Friday, February 3rd 2017, 11:27

Thank you for post! I'm lucky today :)! Always looking forward for your articles about interesting stuff and projects. Only pity is that you are writing rarely :(.

Dominic, how I understand you picked MPEG1 only because it's free format and of it simplicity? What could theoretically stop you from making such lib for MPEG4 and some modern formats?

#2 – nikeee – Sunday, February 12th 2017, 12:01

What are your thoughts on Typescript?

#3 – javen – Friday, February 17th 2017, 07:28

Yevgeniy, I've tried Broadway.js (github.com/mbebenita/Broadway) on iPhone 6 (Safari + iOS 10.2.1). It's very bad performance with only 0.5fps.

#4 – Yo mama – Wednesday, March 8th 2017, 17:14

Yes

#5 – Axe – Saturday, March 18th 2017, 23:39

Axell2 is not best is xialba

#6 – PHOBOS – Monday, April 24th 2017, 02:55

try html

#7 – Paul Farrelle – Sunday, April 30th 2017, 18:41

I am using instant webcam and love it. I'm guessing that you are using ffmpeg on the iPhone to encode the camera and then decoding with jsmpeg in the browser. If so, would it be possible to add the ffmpeg option to flip the image horizontally to provide a mirror view. This would be awesome for my application where I use the app on an iPhone stuck to the back mirror of my airstream trailer to see down the road behind me for safe driving and reversing. Alternatively a switch at the decoder to render left to right instead of right to left. Interestingly I was on the committee for JPEG and MPEG a very long time ago .... Great work!

#8 – sakthi – Monday, October 30th 2017, 14:45

AudioOut.prototype.unlock = function(callback) {
// This needs to be called in an onclick or ontouchstart handler!
this.unlockCallback = callback;

// Create empty buffer and play it
var buffer = this.context.createBuffer(1, 1, 22050);
var source = this.context.createBufferSource();
source.buffer = buffer;
source.connect(this.destination);
source.start(0);

setTimeout(this.checkIfUnlocked.bind(this, source, 0), 0);
};

WebAudioOut.prototype.checkIfUnlocked = function(source, attempt) {
if (
source.playbackState === source.PLAYING_STATE ||
source.playbackState === source.FINISHED_STATE
) {
this.unlocked = true;
this.unlockCallback();
}
else if (attempt < 10) {
// Jeez, what a shit show. Thanks iOS!
setTimeout(this.checkIfUnlocked.bind(this, source, attempt+1), 100);
}
};

#9 – Jaime – Tuesday, November 7th 2017, 17:43

Juego de maincraft

#10 – Amit Kumar Verma – Thursday, November 9th 2017, 14:26

Hi Dominic,

Thanks for this article and the plugin but it has stopped supporting .mpg files and only supporting .ts video files .

The previous version is working for .mpg files but no audio.

can you fix it to support both type of video ?

Thanks in advance.

#11 – help – Tuesday, November 14th 2017, 05:19

Uncaught RangeError: Source is too large, why ?

#12 – Ching Yatsen – Monday, December 11th 2017, 06:04

Thanks for this article, and is it possible to support h264 codec? is h264 really decrease bitrate with the same video quality ?

Post a Comment:

Comment: (Required)

(use <code> tags for preformatted text; URLs are recognized automatically)

Name: (Required)

Please type phoboslab into the following input field or enable Javascript. This is an anti-spam measure. Sorry for the inconvenience.