Localization refers to our ability to discern the origin of a sound source in space relative to ourselves, including its horizontal and vertical angles, its estimated distance and any perceived movement. We use a number of monaural (one ear) and binaural (two ears) audible cues for these purposes. One such binaural cue is the interaural time difference, or ITD. This refers to the difference in time it takes a sound to reach one ear compared to the other. Sounds located directly in front of or behind us will reach both ears simultaneously. If the angle of the source is moved until the difference is greater than 20 microseconds (millionths of a second), a difference in location can be perceived. As a source moves more directly to one side of your head or the other, our ability to discriminate its location using the ITD method diminishes somewhat.
A second binaural mechanism, called the interaural intensity difference, or IID, uses the difference in amplitude caused by the head physically masking sounds coming from one side or the other.* The masking is called the Head-related Transfer Function (HRTF), and you may see environmental recordists with head-shaped binaural microphones, such as the Neumann KU100 pictured below, trying to recreate this function. Because lower frequencies with longer wavelengths refract more easily around objects, this mechanism is more effective for higher frequencies. The shape of the pinna (outer ear flap) also filters frequencies depending on their angle of incidence. The pinna is also responsible for our ability to place sounds in the vertical plane using this filtering mechanism. Try folding your ear flap over and see how well you can still place sounds. Sound waves reflecting off the shoulder also provide some location cues. All of these mechanisms are ineffective below approximately 270 Hz, as witnessed by the often out-of-the-way placement of subwoofers in surround sound setups.
As mentioned earlier, however, the ear canal itself resonates frequencies between 2-5 kHz, depending on the angle of incidence of the sound. So this mechanism can provide a monaural cue, as slightly turning one's head to increase or decrease the intensity of this resonance is computed by the brain. It may also change the phase of the many reflected signals entering the ear canal off the body and pinnae, and alter the constructive and destructive interference taking place as a result. In fact, a great deal of our ability to localize sound is a psychoacoustic learning mechanism, sometimes referred to as the cone of confusion for sound stimuli that cannot be immediately placed, as they may lie equidistant between the two ears. Prior experience and visual cues aid in resolving some of the ambiguity when it exists.
A psychoacoustic phenomenon to keep in mind when placing loudspeakers is the precedence effect (also known as the Haas Effect), in which a listener receiving the same signal from multiple speakers will place it at the closest speaker, and not in between, unless the time difference between the signals' arrival is less than about 35 milliseconds. Over 35 milliseconds, the arrival of the second signal is perceived as an echo of the first. This is why you should try to sit in a central location at a multi-channel electronic music concert! In stadiums, churches, and other large areas with public address systems, signals are often delayed to loudspeakers placed farther away from the origin of the sound, so that listeners sense the sound to be coming from the location they expect it to.
In judging the apparent size of an acoustic space, the aural cues depend on many factors, including the time elapsed from hearing the source sound to hearing the earliest reflections, the onset of reverberation, the intensity and duration of reverberation, diffusion of high frequencies, and the resonant frequencies of the reverberation. With multi-channel sound and control over artificial reverb, many interesting and novel spatial effects can be created. Some modern multi-speaker, multi-plane sound recording and reproductive systems, such as high-order ambisonics, are based on our ever-expanding knowledge of localization.
The interaural intensity difference (IID) is sometimes referred to as the interaural level difference, or ILD.