Wednesday, 3 February 2010

The state of sound in Win32

To flesh out my video-conferencing related posts I guess its only fair I have one on sound as well. Windows sound systems to be precise. As I've recently learned, there are *5* different systems that deserve a mention.

Windows Multimedia Extensions (WMME)

This has existed since the dawn of time (Win3.1) and provides a nice fail safe API for sound output. It generally only works for one device at a time unless its emulated (the case for WinXP and up).  The latency you get when using this API makes it completely impractical for video conferencing use.


This used to be part of DirectX and is now part of the platform API (DirectShow?). Apparently provides latency down to around 50ms. On Vista I have seen Speaker-to-Mic sampling latencies range from 60ms to 220ms.

WDM Kernel Streaming (WDMKS)

This API provided the closest access to "bare metal" as possible until Windows Vista. WDM KS gives much better latency than WMME and DirectSound but it generally means opening a device for exclusive access and since you can't steal access to a device from another running application. I don't believe the average user understands the concept that a paused You Tube video is still using their sound device so this API is written off for practical reasons.


Steinberg created this API to provide low latency access to audio for professional applications. It solves the latency issues but requires ASIO drivers which only high-end cards seem to have. There is a project called ASIO4ALL but it achieves ASIO support on regular devices through use of the WDM KS API, thus making it no better than accessing WDM KS directly.


This was added in Vista and seems to be the best sound API that Microsoft have come up with. WASAPI internally uses 32 bit floating point samples at a fixed sample rate (determined by the device drivers. typically 48khz). Mixing is done by the OS and tracked per application. In the general case, under Vista and Windows 7, this is the only API that directly accesses the hardware. This might not be the case if using legacy drivers but seems to be true of all systems I've tested audio code on.  Implementation of the other sound API's are emulated and running on top of WASAPI.

(Need to check this is the case for WDM KS.. It may run either/or with WASAPI.)