The cocktail party effect is the phenomenon that in a crowded room of people with dozens of simultaneous conversations going on at once, our ear/brain duo is able to focus in on the one person that we really intend to listen to, and ignore extraneous conversational noise.
I was inspired to write this post when I was giving guitar-playing quizzes in a middle school. I allowed everyone in class to continue to practice on their guitars while I would walk to an individual and listen to their playing for a grade. Of course, with the confrontation of the teacher, students tend to become timid and play more quietly, while at the same time, the aggregate sound of 25+ other guitars being vigorously strummed is very loud. In accordance with the cocktail party effect, I was still able to discern exactly the sounds coming from the guitar that the quizzed student was playing while ignoring the rest.
Let's consider some variables (some of which I discussed in Music and the Brain: A Primer, of course) and how they relate to this effect.
|I hope they don't mind me borrowing this image.|
If we reduce the complexity of this effect to a situation with one listener and two speakers, we can explore the importance of amplitude. If you were listening to two people talking to you and one person whispered while the other shouted, who would be easier to hear? The louder person, of course, because the amplitude of their voice is so much greater that it overwhelms the effect of the tiny little changes in air pressure made by the other's whisper. I would hope that everyone over the age of 4 could follow that logic easily. Consider then, that in a crowded room, the average volume at which people are speaking is about the same. Sound does dissipate over distance, so you could potentially have the benefit of proximity; the person you want to listen to is closer, and thus louder to your ears. Yet sometimes you may be listening to someone further from you than another speaker. Our brains can still focus in on that person. Cool. Though without the benefit of greater amplitude, it makes things harder.
|Image from USRA.|
In another situation, let us imagine you and a friend are inside a building when one of those really annoying, piercing, and loud fire alarms goes off. If you head toward the front door and your friend says to you, "wait, the fire exit is this way," you would likely be able to hear them. This is because the high-frequency hair cells in your cochlea would be reacting to the high frequencies at high amplitude (the alarm), but the middle-frequency hair cells would be reacting to the frequencies of your friends voice, even though their amplitude is lower. This effect is much more dramatic if a very loud but low-pitched sound is in the background and you need to listen to someone who speaks at a much higher frequency than that sound because of our brain's tendency to respond more dramatically to higher pitches (this also explains why whistling is so audible in most places).
Timbre (tone color): I previously defined this as a musical element, but it is most accurately a characteristic of sound. Importantly, however, it is not a property of waves as frequency and pitch are. Timbre is the "color" of a sound — it is not only the characteristic that enables us to distinguish a trumpet from a clarinet, it is also that which enables us to distinguish a father's voice from a brother's. As it happens, most sounds from most sources do not produce only one frequency. That's right, no matter how hard you try to sing perfectly on pitch, your voice will be producing additional frequencies. These additional frequencies are [essentially] higher versions of the fundamental pitch.
|Because of additional frequencies, the waveform of a voice |
looks more like this than it does the examples above.
Consideration of this characteristic almost sounds like the nail-in-the-coffin for why we can hear one person over so many others at a cocktail party. People sound different, so if you are listening for a particular timbre, you'll be able to disregard others as meaningless noise. That, however, is not the end of this analysis. Recall my guitar-quizzing story. The differences in timbre from one guitar to another are extremely small. If I had to rely only on timbre to distinguish who I was listening to, I wouldn't have a chance.
Direction: We have two ears! How cool is that?!?! OK, don't get too excited. We also have two eyes, and their separation enables our brains to combine two two-dimensional images to determine depth and therefore form one three-dimensional perspective. Our ears do the same thing for sound. The auditory "picture" from each ear is compared to the other to determine from where a sound originated. However, there is a pretty significant difference between the eyes and ears where this analogy breaks down. The brain has to measure the time between the reception of a sound in one ear and the subsequent reception of that sound in the other, whereas that sort of direct measurement of timing isn't used to combine our eyes' visual images (at least not in the way that I understand it). The speed of sound is about 1100 feet/second. The distance between my ears is about 6.5 inches (yes, I just measured). That means it takes sound about half of a millisecond to get from one ear to another, and that's the best case scenario! The closer a sound gets to being straight in front of or behind a person, the more precise this measurement has to be in order to distinguish which ear the source is closer to.
So in the cocktail situation, we can focus our listening to a certain direction. Sure, it's easy to say that the person speaking to you is also the person you intend to listen to, and that that person is standing directly in front of you. However, you could prefer to listen to the gossip going on behind you and a bit to your left, and ignore the person in front of you while smiling and nodding anyway. In this situation, your brain has to identify the ideal ear-to-ear difference in sound and focus only on frequency/amplitude pairings that match the needed timing difference! This is even more incredible when one considers the "mixing" of sounds when they reach the ear. We only have one ear drum per ear, so it's not like we can devote ear drum A to speaker A, B to B, and C to C. All of the various frequencies, amplitudes, and timbres get mixed together in the air and when they reach our ear drums. This means that in order to focus on the sound of one person's voice, the brain must separate that mixed-together signal into at least one proper combination of frequency, amplitude, timbre, and direction.
|Surround Sound diagrams represent this idea well. Image from Wikipedia.|
Therefore, in order for me to properly listen to one guitar out of 30, my brain has some serious work to do. The amplitude of the guitar I want to hear isn't worth much; I have to rely on proximity to benefit from this variable at all. The timbre is almost worthless; the biggest factor on that seems to be how clearly the student creates sounds. Direction is vital, because there could easily be other guitars equally close to me, but the location of one matters most. Frequency in this situation is significantly beneficial; I am listening for a melody or chord progression that I know well, so my expectation of certain frequencies enables me to block out those that don't match.
I was going to follow the above analysis with an examination of current research, but I've already met the intended limit of length for this post. All of the above is rudimentary in the study of the cocktail party effect, and there is a lot of research still to do — research that has beneficial impacts on subjects ranging from cognitive psychology to sound engineering.
The questions that scientists are left asking center around how we are able to focus on these characteristics, which specifically addresses auditory masking (and cognitive sequencing to a lesser degree). If you are interested in knowing more, I recommend a review of such studies that was published in 2000: The Cocktail Party Phenomenon: A Review of Research on Speech Intelligibility in Multiple-Talker Conditions. (To be completely honest, I haven't read it yet, I just really want to publish this post and it looks like good research.)