QOA Benchmark Results and File Format Specification
The specification for the Quite OK Audio Format, announced in a previous blog post, is now finalized. QOA is a lossy audio compression format. Typical audio signals (44100hz, stereo) are encoded into 278 kbits/s, or more precisely 3.2 bits per sample – exactly 1/5 of the bits needed for an uncompressed WAV.
The QOA-Specification fits on a single-page PDF. More information and test samples can be found on qoaformat.org. The source is on github.
Performance Improvements
Over the past few weeks I implemented some changes to the reference en-/decoder to improve performance, specifically when decoding.
The largest performance gain, an 8% improvement, came from a somewhat
counter-intuitive looking change: a specialization of the qoa_clamp()
function.
// before
static inline int qoa_clamp(int v, int min, int max) {
if (v < min) { return min; }
if (v > max) { return max; }
return v;
}
// after
static inline int qoa_clamp_s16(int v) {
if ((unsigned int)(v + 32768) > 65535) {
if (v < -32768) { return -32768; }
if (v > 32767) { return 32767; }
}
return v;
}
The clamp function in the decoder is extremely hot. It is called for each
output sample to clamp it into the 16 bit range. The reason that qoa_clamp_s16()
is faster than qoa_clamp()
is owed to the fact that these output samples are
very rarely clamped at all.
The extra if
statement here checks for both branches (v < -32768
and v > 32767
)
simultaneously and is very rarely taken. This helps the CPU's branch predictor to
correctly predict and skip the branch in the vast majority of cases.
I also tried to re-implement the whole inner decoder loop with x86/x64 SIMD
intrinsics, but failed to make it faster than what gcc -O3
would produce – a
testament to the advances of modern compilers! Luckily, QOA is already pretty
fast.
Benchmark Results
For these benchmarks I used the 2 hour 43 minute stereo audio track of the
Bladerunner 2049 movie. All codecs were tested single threaded
(-threads 1
for ffmpeg) and output was sent to /dev/null
. The Vorbis, Opus,
MP3, M4A and ADPCM files were encoded with ffmpeg's default settings.
My CPU is an Intel i7-6700k; input files were stored on an NVME SSD. The
measurements were taken with time
in the terminal, using the best out of 5 runs.
Please take these results here with a huge grain of salt. This is only meant to
give a general idea of the performance between various codecs.
Results for bladerunner.wav, stereo 44100 hz, 9807 sec, 1650 mb:
Format | Lib/Tool | encode (s) | decode (s) | size (mb) |
---|---|---|---|---|
wv -b3 | wavpack | 42.175 | 29.800 | 304 |
wv -b2 | wavpack | 37.821 | 25.690 | 234 |
opus | ffmpeg | 315.232 | 20.410 | 116 |
wv | wavpack | 26.494 | 15.584 | 558 |
vorbis | stb_vorbis | 13.593 | ||
mp3 | ffmpeg | 146.223 | 9.631 | 149 |
vorbis | ffmpeg | 145.125 | 8.732 | 200 |
flac | ffmpeg | 18.148 | 8.715 | 552 |
mp3 | dr_mp3 | 7.503 | ||
aac | ffmpeg | 86.884 | 6.636 | 151 |
flac | dr_flac | 6.429 | ||
adpcm_ms | ffmpeg | 10.424 | 3.690 | 417 |
qoa | qoaconv | 25.752 | 2.996 | 333 |
Curiously, even MS ADPCM is slower to decode than QOA. I suspect that ffmpeg itself has some considerable overhead, with the data traveling through many layers of abstractions.
So as it stands, QOA is the fastest format to decode in my (admittedly limited) tests. In any case, I believe that QOA offers a worthwhile tradeoff between file size, quality and decoding speed.
Implementations
Apart from the reference implementation, QOA has already been ported to a number of different languages, including JavaScript, C++, R, Rust, D and this Ć project by @pfusik which transpiles to C, C++, C#, Java, JavaScript, Python, Swift and TypeScript.
As part of @pfusik's project he also provides a QOA Plugin for Foobar2000. With Foobar2000 being one of my favorite pieces of software, this is very cool to see!
QOA is also supported in the game engine raylib. It's probably just a matter of time until we see the first game that uses QOA for audio playback. I'm excited!
The Specification
The final specification details the exact file format as well as the steps needed to decode. Again, the whole specification is just a single page PDF: