Chapter Five: Digital Audio

8. Digital Audio File Formats

Digital audio files, which are the storable and editable collection of samples organized in a standard form, can be stored on computer drives, transferred to other computers or samplers, shared on the Internet to be downloaded, added to video files, or played-back in real time. They are different from audio CD or DAT tracks, which mostly contain only the raw sample data or items useless to a computer, such as error-correction and subcode data to help point the CD laser, etc. That is why a CD track must be extracted or ripped to an audio file format to be usable by a computer application. A standard 16-bit, 44.1K stereo file eats up about 10 megs of disk space per minute of sound.

Audio files come in a huge variety of types, which can influence their bit depth, multi-channel organization, compression scheme, sampling rate, organization of bytes high to low or visa versa (called endian-ness) and amount of non-sample information stored in an area called the header, in units called chunks. Some audio file formats may also contain MIDI data. Some may contain data from the editing session that created them, such as the boundaries and names of regions, or looping points and root keys for samplers. Many audio programs are capable of opening and converting several file formats within limits. And some sound formats work only on a particular computer platform, for example, PC or Mac. Some additional terms you may see when looking at a sound file format is related to the bit-depth often tied to how computers store different sized numbers. Common audio file sample sizes are often called 8-bit chars, 16-bit short integers, 24-bit signed and unsigned integers, 32-bit unsigned long or 32-bit floating point (floats).

Stereo sound files can be organized as interleaved, where the sample bytes of respective channels alternate in a single stream (LRLRLR, etc), or as two separate files called split stereo, where one file contains the LEFT channel samples and another file contains the RIGHT channel samples. By convention, these are usually labeled with the same name with a .R or .L suffix (ex. myaudio.L, myaudio.R). Most programs will simultaneously open both files by default. Many programs, such as MOTU's Digital Performer and Digidesign's Pro Tools, work only with split stereo files—when importing an interleaved file, they will automatically split it into two files. However, some applications will play only interleaved stereo files, so the separate files must be "bounced to disk" and then exported as an interleaved file.

In addition, some sound file formats accommodate multiple audio tracks beyond stereo in a single file, though it may be ambiguous when given to others where each channel is routed. For that reason, many computer music festivals request multi-channel compositions as separate mono stems, one for each speaker needed, and well-labeled as to what channels they below to. If encoding audio for video 5.1, 7.1 or 10.1 surround, however, the channel format is well-established and players will automatically route the right tracks to the right speakers. Certain compression encoding schemes, such as the Dolby Digital AC-3 format used on DVDs and Bluray discs, vary according to the specific commercial encoder/decoders (Dolby Digital, THX, etc.). We had discussed ambisonics previously in this text as a special kind of surround recording and reproduction, and it too has its own interleaved file formats (WAV/.amb or AmbiX/.caf) that requires ambisonic encoders and decoders based on the order and ambisonic format (B-format, etc.). For more information, click here.