Chapter Five: Digital Audio

8. Digital Audio File Formats | Page 2

Many of the file formats were originally designed to work with a specific processor chip. For example, the AIFF format was designed for the Motorola 680x0-based Apple Macintosh, which used big-endian byte order), but later substituted AIFC with the advent of Intel chips, which were little-endian byte order for the encoded audio, but not for the other file information. Microsoft WAV or WAVE format was designed for the Intel 80x86 processors, in which the LSB comes first (big-endian). Most high-resolution file formats are now bi-endian, which is great, because it gets really confusing to keep track of as formats change and evolve and it allows the file formats to be as cross-platform as possible.

Audio files have two main nomenclatures relating to their function. The first is the codec or how the actual audio has been encoded and needs to be decoded. For example, we mentioned linear PCM encoding earlier on in the quantization section. The second designation for a file is its container format, sort of like the wrapper you put your ham sandwich in. Whether it is plastic wrap, or a Ziploc bag, it still contains your ham sandwich. A container, such as WAVE, mp3, or FLAC will not only contain the encoded audio data, but also metadata that specifies everything from the type of audio encoding that follows, to file length, sample or bit rate, endianess, compression scheme and depth, etc.. Some containers only accept certain codecs, such as AIFF, which only accepts only uncompressed linear PCM, while others accept a variety of codecs. For example, AIFF's sister format AIFC accepts both uncompressed linear PCM and compressed audio such as the μ-law encoding we mentioned earlier.

You may encounter a third nomenclature when dealing with web audio and your web browser, which is a media type (formerly MIME type), such as audio/mpeg which lets your browser know how to reproduce the file type and contents, calling on an installed helper app if necessary. If your browser can't play back a particular type of audio file, you may need to add it's MIME type in your browser, or your browser might not support that MIME type at all.

Most of the formats below, originating from the video game company Electronic Arts' EA Interchange File Format 85 concept, are structured in chunks, with each chuck containing a header with information about what follows, such as the chunk ID (ckID), which describes aspects of the data to follow (for example, number of channels, sampling rate, endianess, compression scheme, and so forth), chunk size (ckSize), and finally the chunk data itself, which can be the actual audio samples or other types of data. Some containers combine the ckID and ckSize, or even put the size last (see .caf below). An IFF-style file can contain many chunks. If a chunk isn't needed for a particular use, it can be skipped and ignored. The  ID3v2 tag chunk, for example, often found in mp3 files, may list extensive information about the music title, performer, etc. which appears in iTunes or streams across your car stereo, beats per minute, but can also be skipped. An AIFC file contains an entry in the COMM (Comment) chunk that describes whether there is compression or not, and what type of compression it might be.

Finally, file types can be classified as being either lossless uncompressed, lossless compressed, or lossy compressed (synonymous with "lousy compressed" at low bit rates). To stress again, keep your master copies in uncompressed format and then transcode to a lossy format when needed. Try not to transcode from one lossy format to another, which degrades the quality even further, but start again from your uncompressed copy. Below is a chart of the most common of each in current use in 2020, but for further detail, here is an excellent article on various codecs.

Most Common Audio File Types for Electronic Studios
Lossless Uncompressed AIFF WAVE
Lossless Compressed FLAC ALAC
Lossy Compressed AAC MP3

Perception-based Audio Compression

Video games and computer graphics programs have long saved computation overhead by not calculating portions of a scene that are out of view off-screen. In an effort to save memory space, or faster file transfers for sharing on the Web, etc., engineers developed similar strategies for removing portions of digital audio information that would not be perceived based on psychoacoustic principles, primarily the masking phenomenon discussed in the Acoustics chapter. The MP3 encoding algorithm that was developed in the late 1980's could offer acceptable audio quality (for those more concerned about space/speed than quality) with a compression savings of approximately 10:1; a 10 Mb uncompressed 44.1 kHz file might be a 1 Mb MP3 file at 128 kbps (kilobits per seconds). In addition, the format allowed for variable bit rates (VBR) in addition to constant bit rates (CBR), so more information could be included where needed, and less when not. AAC is the more recent perceptual encoding format and does a somewhat better job. It is the default for most mobile phones, game consoles, etc. as of this writing. The importance for composers is that compressed formats discard information that can never be retrieved from the compressed files, and therefore (one more time), do not master your music in compressed formats. For an excellent, more detailed description of the encoding process, explore this page.

Some of the common sound file format types are (.xxx indicates common filename extensions used for these formats):

  • .aif or .aiff or AIFF (Audio Interchange File Format), the gold standard of 16-bit uncompressed PCM audio, travels well between almost all computers and software, includes header information like file name, sampling rate, MIDI note number for samplers, loop points, number of bytes in file. It was developed by Apple, based of the IFF format , and is now bi-endian with the addition of AIFC. Also capable of 24-bit, 32-bit and even 64-bit resolutions, but has a file size limitation of 4 gigabytes. Has the capability to theoretically contain unlimited number of channels. In many cases, files that look like AIFF files are actually AIFC-formatted, though they may sport the .aif or .aiff file extension. Click here for original specs.

  • .aifc or AIFC or AIFF-C (potentially compressed version of AIFF—does not have to be compressed, supports little-endian PCM audio data using the AIFC/sowt ("twos" backwards) scheme, though the header data still used the AIFF big-endian format. Some older AIFF-happy programs will choke on AIFC, particularly if audio is compressed. Click here for a .pdf draft spec from Apple Computer.

  • .wav or Microsoft WAVE (Designed for PCs and Windows, but now usable with most audio programs, Mac, PC or other). Similar to AIFF for bit-depth and sample rates. As mentioned above, it uses MSB's and LSB's in reverse order of AIFF files, so Microsoft developed the RIFF interchange File Format to support the "little-endian" scheme. Like AIFF, has a 4 gigabyte file size limitation.

  • .m4a or .aac or AAC, or Advanced Audio Coding scheme developed by Sony, ATT, Dolby Labs, and the original MP3 folks, which may encode compressed (lossy) multi-channel 5.1 surround files, as well as other mono, stereo other massive multi-channel formats at lower bit-rates or up to 96 kbps and 24-bit resolution. It is commonly used in compressed video files for the web. AAC bypasses the limitations of mp3 in that is it NBC (non-backward compatible), though it is based on the MPEG-2 standard and is thought to sound better than mp3's as similar bit rates. The new AAC MPEG-4 adds even further quality for coding at low bit rates and in fact it seems like AAC is the MPEG-4 audio standard, hence the m4a extension. Click here for further information.

  • .mp3 (MPEG I-audio layer 3 compression)—In 1987, the Fraunhofer IIS-A started to work on perceptual audio coding in the framework of the EUREKA project . In a joint cooperation with the University of Erlangen, the Fraunhofer IIS-A finally devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 . With the proper codecs, compression rates of up to 24 times can be achieved with near- (but not) CD-quality. The beauty of MP3 is it's size vs. perceived quality, also its ability to be downloaded and then loaded into the flash memory of cell phones and MP3 players. It can also be streamed to MP3 client software, recognized most Web browser audio helper applications. Files are encoded at certain bit-rates for target download speeds; for example, very good quality can be attained with 160 to 192 kbps encoding, and even better up to 320 kbps. Would you want to master your music on MP3—no, but at least you can listen to it while you're jogging.

  • .ogg (Ogg, sometimes called Ogg/Vorbis for the Vorbis variable-bit-rate encoding, which has now been replaced by Opus encoding). This is a lossy, open-patent format that supports music, video and text, which makes it ideal for online multimedia content. It is a recommended fallback format for HTML5 web audio that uses mp3 or other first-line formats. First created in 1993 by the Xiph.Org Foundation.

  • Some legacy or less frequently used formats:

  • .sd2 or SD II (Sound Designer II—same as AIFF with added proprietary information such as markers and regions). Was developed for the Digidesign Sound Accelerator II DSP card in 1985 to edit samples. Was in long use for Pro Tools and Digital Performer and can still be opened by both. Not always portable to non-Mac computers.

  • .caf or Core Audio Format developed by Apple, and published here. Like several other formats, CAF is a wrapper for many different kinds of encoding, from high-resolution to very compressed. Apple enthusiastically tells us that unlike AIFF, AIFC and WAVE, CAF uses 64-bit offsets that allow for unlimited file sizes ("...hundreds of years of sound"), and that CAF appends its file size at the end of a file, so does not need to rewrite header when recording/editing and potentially hose the file if an error occurs while writing the new size to the header. If you ever installed Apple GarageBand or Logic Pro on your Mac, chances are you will find a whole library of CAF sample files in /Library/Audio/Apple Loops.

  • .ra or .ram (Real Audio, used to be the be all and end all in web audio, required RealAudio app or client plug-in...they are still in business in 2020, but audio streaming has moved on to better choices)—could be streamed on the Internet from a Real Audio server, so sound starts playing before file fully downloaded. They were encoded at multiple sampling rates to accommodate different user download speeds (modem (what's that?), DSL, T1 lines, etc.) which ranged from 8 kbps to 1.5 Mbps (don't try 1.5 Mbps on your grandmother's 28.8 modem). Could also be combined with video for Real Media streaming. It spreads compression artifacts across the spectrum so they are theoretically not as noticeable. See www.real.com. Very much a legacy format now.

  • WMA or Windows Media Audio include proprietary codecs designed for use with Window Media Player with various compression ratios up to lossless WMA Pro, which supports up to 768 kbps. Needs proprietary players, such as Winamp.

  • .au (a-law, still used with telephony, previously Sun computers or .snd used with NeXT's). Non-linear sampling using companding, in which the bandwidth is limited for lower sample rates and amps are encoded non-linearly so higher amplitudes require fewer values. .snd (SND) files, which were used on the NeXT computer, were essentially .au files, often referred to as NeXT/SUN (au) format.

  • .ul or μ-law (also mu-law, pronounced myew-law) US telephony, headerless, usually 8-bit, usually low quality). Similar approach as a-law with non-linear encoding, band-limited and companding.

  • .sf (IRCAM Institut de Recherche et Coordination Acoustique / Musique in Paris). Was developed for much of IRCAM's proprietary software, then the awesome microcomputers of the time such as Sun, VAX, MIPS (Dec), NeXT with accommodations for the data types and endianess for each. We don't recommend you submit your next conference piece in this format. IRCAM also developed the SDIF (Sound Data Interchange Format) along with CNMAT, and that is still in use today for a variety of purposes, such as resonant modeling coding.

There are many, many more sound formats out there currently in use, many coming into existence and those that have hit the dust bin. A Web search for audio codecs and audio file formats will yield many results.

Most audio programs and DAWs will save existing files in alternate formats, called transcoding, either interleaved or as separate mono files, including Digital Performer, Logic, Adobe Audition and Adobe Media Encoder, etc. Below are some of the many choices Adobe Audition gives for transcoding an audio file with "Save As..."

audio_formats

In addition, utility programs for creating, editing or fixing files headers can be extremely useful. A fantastic and free (thanks Tom) program, SoundHack by Tom Erbe for converting sound file formats and much more can be found here (being updated for Catalina 64-bit now).