Dominic Szablewski, @phoboslab
— Wednesday, November 24th 2021

Lossless Image Compression in O(n) Time

Introducing QOI — the Quite OK Image format. It losslessly compresses RGB and RGBA images to a similar size of PNG, while offering a 20x-50x speedup in compression and 3x-4x speedup in decompression. All single-threaded, no SIMD. It's also stupidly simple.

tl;dr: 300 lines of C, single header, source on github, benchmark results here.

QOI compression

I want to preface this article by saying that I have no idea what I'm doing. I'm not a compression guy. I barely understand how Huffman Coding and DCT works. Luckily, QOI uses neither.

I was just tinkering with some ideas that I thought would maybe compress images. The result surprised me quite a bit.

Why? A Short Rant.

File formats. They suck. I absolutely loathe the usual suspects. PNG, JPEG or even worse MPEG, MOV, MP4. They burst with complexity at the seams. Every tiny aspect screams “design by consortium”.

A while ago I dabbled into MPEG a bit. The basic ideas for video compression in MPEG are ingenious, even more so for 1993, but the resulting file format is an abomination.

I can almost picture the meeting of the Moving Picture Experts Group where some random suit demanded there to be a way to indicate a video stream is copyrighted. And thus, the copyright bit flag made its way into the standard and successfully stopped movie piracy before it even began.

MPEG, an industry standard conceived 3 decades past, all patents long expired, all professional interest abandoned. Yet, the holy specification — there named ISO/IEC 11172-2 — is a well guarded secret, revealed only to those that fork over a cool $200 to endow the sacred work of the ISO.

Alternative open video codecs exist, but are again immensely complex. They compete with the state of the art, require huge libraries, are compute hungry and difficult to work with. Alternatives for PNG all compete on the compression ratio too, with ever increasing complexity.

There absolutely is a market for video, audio and image codecs that trade compression ratio for speed and simplicity, but no one is serving it. (Well, these guys maybe, but it's all proprietary.)

Yes, stb_image saved us all from the pains of dealing with libpng and is therefore used in countless games and apps. A while ago I aimed to do the same for video with pl_mpeg, with some success.

But with all that we learned, why did no one go back and implement a simple compression scheme to compete with PNG, but without the cruft? Why did no one implement a simple video compression scheme similar to MPEG, but in a sane file format instead?

I was tinkering to do the latter: to take parts of MPEG-1 and make it easier to parse, easier to accelerate on a GPU. A good enough video codec.

Instead I stumbled into a solution for the former: a lossless image format that competes with PNG for some use cases. A slightly worse compression ratio, but magnitudes less complexity.

Technical Details

QOI encodes and decodes images in a single pass. It touches every pixel just once. Every pixel is encoded in one of four different ways.

The resulting values are packed into chunks starting with a 2..4 bit tag (indicating one of those four methods) followed by a number of data bits. All of these chunks (tag and data bits) are byte aligned, so there's no bit twiddling needed between those chunks.

The four different methods are:

1. A run of the previous pixel

If the current pixel is exactly the same as the previous pixel, the run length is increased by 1. When a pixel is encountered that is different from the previous one, this run length is saved to the encoded data and the current pixel is packed by one of the other 3 methods.

The run length chunk comes in two different flavors, depending on the number of bits needed.

QOI_RUN_8 {
    u8 tag  :  3;   // b010
    u8 run  :  5;   // 5-bit run-length repeating the previous pixel: 1..32
}

QOI_RUN_16 {
    u8 tag  :  3;   // b011
    u16 run : 13;   // 13-bit run-length repeating the previous pixel: 33..8224
}

2. An index into a previously seen pixel

The encoder keeps a running array of the 64 pixels it previously encountered. When the encoder finds the current pixel still present in this array, the index into this array is saved to the stream.

To keep things O(n) when encoding, there's only one lookup into this array. The lookup position is determined by a “hash” of the rgba value (really just r^g^b^a). A linear search or some more complex bookkeeping would result in a marginally better compression ratio, but would also slow things down a bit.

QOI_INDEX {
    u8 tag  :  2;   // b00
    u8 idx  :  6;   // 6-bit index into the color index array: 0..63
}

3. The difference to the previous pixel

When the current pixel color is not too far from the previous one, the difference to the previous pixel is saved to the stream.

This comes in 3 different flavors, depending on how big the difference is. Note that this focuses on the RGB value; alpha changes are more costly.

QOI_DIFF_8 {
    u8 tag  :  2;   // b10
    u8 dr   :  2;   // 2-bit   red channel difference: -1..2
    u8 dg   :  2;   // 2-bit green channel difference: -1..2
    u8 db   :  2;   // 2-bit  blue channel difference: -1..2
}

QOI_DIFF_16 {
    u8 tag  :  3;   // b110
    u8 dr   :  5;   // 5-bit   red channel difference: -15..16
    u8 dg   :  4;   // 4-bit green channel difference:  -7.. 8
    u8 db   :  4;   // 4-bit  blue channel difference:  -7.. 8
}

QOI_DIFF_24 {
    u8 tag  :  4;   // b1110
    u8 dr   :  5;   // 5-bit   red channel difference: -15..16
    u8 dg   :  5;   // 5-bit green channel difference: -15..16
    u8 db   :  5;   // 5-bit  blue channel difference: -15..16
    u8 da   :  5;   // 5-bit alpha channel difference: -15..16
}

4. Full rgba values

If all 3 previous methods fail, the r, g, b, a values (but only those that are different from the previous pixel) are saved to the stream as full bytes.

QOI_COLOR {
    u8 tag  :  4;   // b1111
    u8 has_r:  1;   //   red byte follows
    u8 has_g:  1;   // green byte follows
    u8 has_b:  1;   //  blue byte follows
    u8 has_a:  1;   // alpha byte follows
    u8 r;           //   red value if has_r == 1: 0..255
    u8 g;           // green value if has_g == 1: 0..255
    u8 b;           //  blue value if has_b == 1: 0..255
    u8 a;           // alpha value if has_a == 1: 0..255
}

That's it.

If you have a minute, please read through the qoi.h source.

Onward

Seriously, I'm dumbfounded. BMP and TIFF have run-length-encoding and then GIF comes around with LZW. But there's nothing in between. Why? I found the space between RLE and LZW to be large enough to spend many days on. And there's a lot more to explore.

Working on QOI was a lot of fun. I had a "test runner" with some sample images lying around. Seeing how every change I made affected the compression ratio was quite exciting.

With some more work, QOI could serve as the basis for a lossless video codec, suitable for screencasts and the like.

SIMD acceleration for QOI would also be cool but (from my very limited knowledge about some SIMD instructions on ARM), the format doesn't seem to be well suited for it. Maybe someone with a bit more experience can shed some light?

I'm also quite hyped to explore the even larger space of a simple, lossy image compression format. Many texture compression schemes have very exciting ideas, yet there's nothing that competes with JPEG but with less complexity.

© 2021 Dominic Szablewski – Imprint – powered by Pagenode (3ms) – made with <3