Wednesday, 16 September 2009

ffmpeg and mingw

I love ffmpeg and all it stands for. It has developed into an amazing library with support for an incredible array of file formats, codecs and protocols. It's widely known, widely used and despite its complex options, a wide variety of front-end apps make working with it a lot easier for the layman.

For developers however, ffmpeg has a few stumbling blocks that tend to get in the way. I'm no expert by any means but I thought I'd share a few things I ran into while integrating ffmpeg into a browser plugin I've been working on.



A 1 minute primer

The ffmpeg code is broken up into 5 libraries - avcodec, avformat, avutil, avdevice and swscale. The main libraries for dealing with encoded files are avcodec and avformat.

All modern media file formats are actually just containers. AVI, MPEG, VOB, OGG, ... These don't encode anything, they just provide a way to interleave bits of encoded audio, video and potentially subtitles and other metadata into a single file. Each of these streams is tagged with the type of content they contain and the media player of your choice then handles the reassembly, decoding and synchronization of these streams. In ffmpeg's case, the containers are handled by the avformat library. Wikipedia has a great comparison of the most common containers. As you might expect, the avcodec library takes care of the actual encoding and decoding of streams.

A typical playback use-case involves having avformat open a file format and read out the codecs for the contained streams, then using avcodec to decode the streams before passing the audio and video off to something like SDL for rendering.

But its not that simple

Both avcodec and avformat have dependencies on avutil. This library provides logging, mathematics and pixel format helper functions to both the codec and format libraries. If you happen to use swscale (software scaling for converting between color spaces and resolutions), you'll also need to make use of structures like AVFrame which are delclared in avcodec.

Each container format has its own set of limitations as well. There is no standard way to store FLAC or Theora audio in MPEG containers. AVI containers technically can't handle B-frames (a type of video frame used in high-definition H.264 video) - at least not without some trickery.

To make matters worse, when you try to encode using avformat, you may have to dodge a minefield of potentially bad encoder settings that will randomly cause your encoder to crash with divide by zero errors or invalid memory accesses.

Microsoft Visual Studio and FFMpeg

ffmpeg is a complex beast. It was written for gcc, a C99 compatible compiler. Parts of it and its encoder libraries are optimised in assembly to squeeze every last drop of performance out of your CPU. If you happen to be building a linux app then you have nothing to fear. In fact, if you happen to be building a Mac OS X app, you also have nothing to fear. Both these platforms support GCC and the toolchain required to build ffmpeg out-of-the-box. If you happen to be using Windows and want to integrate ffmpeg with an existing Visual Studio project, you're going to have to take a detour.

Because MSVC is not a C99 compatible compiler, ffmpeg won't compile with it. Instead you'll need to use the windows port of the gcc tool-chain, mingw. You may also use cygwin but cygwin depends on one very large DLL which tends to make redistributable cygwin-compiled software a bit on the fat side (although only about 1.8MB fatter in the latest version of Cygwin. Much better than previous versions.). I'm by no means an expert on either Cygwin or Mingw but in my experience, mingw seems a lighter in this regard.

So you'll first need mingw and probably also msys to make life easier. If you intend on using libx264, you'll want to download yasm as well as this is needed to compile the optimised assembly routines in libx264. Build and install yasm, build and install libx264 and then build ffmpeg with the appropriate configure options. You should now find compiled libraries such as  libavcodec/libavcodec.a in your ffmpeg source directory.

These .a files can be linked directly into your apps from MSVC as static code. Just include them like you would any other .lib file. Assuming you are not using a 64-bit operating system (e.g. Vista 64-bit), you should  not have any symbol problems linking these two parts together with MSVC.

Logging

ffmpeg and the libraries it optionally uses such as speex and libx264 support a nice set of logging functions. They provide a callback system whereby you can override the default callback (outputs to stderr) to link it into your apps own logging functions. The problem with this approach when using MSVC however is that the callbacks that both libx264 and ffmpeg define make use of #include<stdargs.h> "va_args" to handle an unknown number of additional arguments. Ordinarily this is quite convenient except that the behavior of va_list is not standardized and the GNU libc and MSVC libc libraries don't implement them the same way. Both compilers cast va_list to "char *" internally which makes things worst - your code will compile but the behavior of your logging functions is not going to be what you expect and may even cause your app to crash.

The solution is to patch ffmpeg and (optionally) libx264's logging functions if you intend to redirect this callback. This just involves rendering log messages to strings _before_ passing them to the callback. It isn't hard to do. I will post my patches after I post this article.

File access and file pointers

Another problem related to the use of both mingw and msvc C runtime libraries (libcrt) has to do with file handles. If linking with MSVC, the function calls open(), close(), read(), etc will generally be linked to the Microsoft MSVCRT library. If you happen to be using libavformat and trying to read a file from disk, you're going to be out of luck.  As soon as lseek() is called (which happens when you write out a files header), you're app will likely go *boom* and segfault. Thanks to incompatible debug symbol tables, you won't be able to see where it goes *boom* either!

The solution to this one is a little bit of a hack. Looking in the libavformat/ directory you'll see avio.c, avio.h, file.c and file.h. Reading through these will give you an idea of how ffmpeg wraps file access up so that HTTP, RTP and local file access can all be handled transparently with the same code.  My solution was to copy the functionality of file.c in an MSVC compiled C file in my microsoft project. I then called av_register_protocol() to register my new protocol. Now, instead of refering to local files as file:c:/myfile.avi I would use msvsfile:c:/myfile.avi.

I must admit I am still not sure why this works. The function prototypes for open(), etc are the same on both compilers so I assumed that so long as you didn't mix and match (i.e. pass a mingw generated file handle to MSVC code and try to use it), things would work. Turns out that isn't the case..



PCM vs MP2 audio

The ffmpeg audio encoding function looks like this:
int avcodec_encode_audio(AVCodecContext *avctx, uint8_t *buf, int buf_size,

const short *samples);

int avcodec_encode_audio(AVCodecContext *avctx, uint8_t *buf, int buf_size, const short *samples);

When encoding a PCM audio stream (i.e. fixed-rate, no compression), the buf_size must exactly match the size of the output data. For most cases using 16-bit audio, this is 2 * num_channels * frame_size. This is clearly a historical hangup as the majority of codecs, lossy or otherwise, don't work on fixed bitrates and in those cases, such as MP2's case, the samplerate and frame_size must be specified via the codec context and in those cases buf_size is the allocated size of the output buffer and the maximum number of bytes the encoder is allowed to write.

There is no substitute to reading the code

Its not what anyone wants to hear but for the time being at least, there is very little documentation regarding the inner workings of much of ffmpeg and there are a LOT of quirks. For instance, I discovered the hard way that the codec options required to encode PCM audio streams (i.e. fixed bitrate, no compression) require the provided output buffer length to exactly match the frame size.

There's a lot of other smaller things I've learned about ffmpeg while developing my app but I've covered the main gotcha's that I encountered.  I am a self-confessed ffmpeg newbie though so please feel free to let me know if I got something wrong above or if you have anything to add.