Tuesday, 26 January 2010

x264's new bells and whistles

When it comes to H.264 encoding, most people know of x264.

Dark Shikari and the x264 team have consistently produced solid, fast and feature-complete code - in many cases better than commercial offerings. This alone is nothing short of incredible considering the budget, geographical distribution and sheer complexity of the encoder itself.

Even though its already one of the best encoders out there, used by the likes of Facebook, Youtube and commercial cable companies, the team seems to show no sign of slowing down. Last week saw a couple of patches committed into the main x264 git repository which have really got my head spinning with (low latency) possibilities.


Sometimes key-frames don't cut it. Think of a low-bandwidth internet connection. A High Definition key frame might be 100+kB in size. Over a 10mbit connection that could take 80-100ms to transmit. Whoops, thats 80-100ms of latency in your stream! Intra-refresh "slides" a vertical bar of intra blocks across the screen, refreshing everythinig in its wake. These intra blocks can be spread out across all frames, removing the spikes and this allowing for smaller frames (and thus lower latency)

--tune zerolatency

I'm not 100% sure what options this turns on but the --tune flag allows users to select pre-set settings without having to worry too much about the particulars. I believe this turns off b-frames and reduces the VBV buffer size to 1 frame.  This means you give up a bit of potential compression but you gain a huge advantage in that you now have only a single frame of latency.


Normal threaded encoding dishes individual frames out to individual threads. Sliced threads is a concept that I believe Dark Shikari coined whereby a frame is split horizontally into slices which are then passed off to individual threads. This achieves the same end but apparently introduces some complexities in the bitrate management making it a case of diminishing returns as the vertical resolution of each slice starts to get smaller (below ~100 pixels). Old threads = additional frame latency. Sliced threads = no additional latency.


This lets you define a limit on the size of slices produced by the encoder. Choose a small enough value and you can fit it in a single UDP packet. This is super convenient. If your frame is split over 10 UDP packets and you lose one of them, you waste the other 9. With this option I believe its possible for slices to be limited to the size of an average UDP packet size and thus ensure that the loss of a single packet won't waste any more than that one alone.

I'm sure there are other things I've not yet discovered and I'd be happy to update this list if anyone knows of any. These four alone though provide so much improvement over previous versions of x264 that real-time video conferencing using this technology is now in the realm of possibility without requiring the use of overpriced "enterprise" video conferencing products.

You can crave more details, you can get a full explanation down straight from the horses mouth.