Bhatara A, Tirovolas AK, Duan LM, Levy B, & Levitin DJ (2011). Perception of emotional expression in musical performance. Journal of experimental psychology. Human perception and performance PMID: 21261418
New research, authored by Anjali, Tirovolas, Duan, Levy, and one of my favorites, Daniel Levitin, was just published and is entitled Perception of Emotional Expression in Musical Performance [abstract]. The experiments were all very well done and moderately thought-provoking.
These researchers sought to learn more about what acoustic/physical aspects of musical performance affect listeners' perception of emotional expression. We're all critics to some degree, but how do those who haven't had decades of musical training judge whether one performance is more or less emotional than another?
To pursue knowledge on this matter, the researchers decided to use piano music. They did so because of how familiar it is to western ears and because of the limitations it places on the performer. The piano is a percussive instrument; the strings that create sound are struck by hammers. This means that the only characteristics of a note that a pianist can change are its timing (how long it is held and how much space there is between (or sound overlapping) notes) and its volume (more strictly called "amplitude," and modified by the speed at which a key is pressed). The pedals of the piano do add complication to this, but those are fairly stringent limitations. Also, both of these characteristics can be modified by a computer. By using piano, the researchers could have very good control over as few variables as possible.
A fancy MIDI piano and ProTools made this research possible. A professional pianist played segments of four of Chopin's nocturnes (Op. 15 No. 1, Op. 32 No. 1 (both major), Op. 55 No.1, and KK IVa No. 16 (both minor)) as expressively as he would play them in a concert setting and they were recorded in MIDI format. The researchers then made modifications to these recordings to play for the participants in their experiments. First, they created a mechanical version of each recording by making all notes the same amplitude and of technically perfect timing. Then they were able to create versions that were in between the mechanical and the expressive (the unchanged recording of the pianist) by adjusting the timing and amplitude to be somewhere in between the expressive choice and the mechanical technicality. They created 25%, 50%, and 75% versions (and eventually 87.5%, 125%, and 150%). Importantly, they also created random versions of each recording by assessing the degree of difference between the expressive and mechanical ones and randomly modifying the notes so that random and expressive were equally different from mechanical, but the expressive was by the command of a musician and the random had "expressive elements" (the timing and amplitude) applied non-musically.
Though the preparation for these experiments is technically involved, the implementation was rather elegant. Participants would listen to the recordings and rate how emotional they found each one to be.
Psh. Science is easy.
Analyzations of the data they gathered were used to attempt to answer the four questions they outline at the beginning of the paper. These are listed here with a my added summaries of their hypotheses in red.
1. To what extent do variations in timing and amplitude affect the perception of a performance? (Well enough that we expect listener ratings to descend in order of the decrease in expressive variability.)
2. What is the nature of the psychophysical function that relates changes in these acoustic parameters (timing and amplitude) to the perception of those changes? (Probably sigmoidal. We think that there are upper and lower limits to how much change listeners can detect.)
3. Are musicians more sensitive than nonmusicians to such changes? (We suppose it's possible for sensitivities to be equal...but seriously, look at all of the citations we have that suggest otherwise. We're not dumb, we just love data.)
4. What are the relative contributions of timing versus amplitude variation? (We have no idea. Nobody else has varied them separately.)Question 1 was pretty solidly answered, and is even pretty clear to those without much experience with statistics. The best part about this, statistically, is that the average ratings did indeed match the sequence of expressivity. In other words, as a group, the participants did not rate mechanical higher than the 50% expressive version, or cause any other such mixup.
As a musician, I was relieved to find that random was not only rated as less emotional than expressive, but that it was rated as even less emotional than mechanical. This means that my years of training on how to express emotion through music were worth it; the subtleties of timing and dynamics really matter (don't worry, professors, I'm not surprised).
Question 2 was answered fairly well, especially in the experiment that included versions of recordings that had exaggerated the expressivity of the pianist, the 125% and 150% versions. The data suggest that there are limits to how listeners discern emotional expression.
The differences between the ratings of the 100%, 125%, and 150% versions here are not statistically significant, but graphically suggestive (that sounds dirty) of a plateau of peak emotional expressiveness. The researchers like the idea of further research into this. At what point would listeners find expressive exaggeration to be as unemotional as an equally random version?
Question 3 sounds simple, but required a couple of creative statistical steps to properly determine. I'll avoid ranting about math and just reveal the answer as, "yes."
In the relevant figure above, notice that the most sensitive musician was sensitive to a degree of .54 (out of 1). They had to ignore one participant's data because he had dramatically more musical experience than the rest, and I found it interesting that his sensitivity was .91. This serves as yet another lead toward further research.
Question 4 is a tricky one, but the researchers did what they could with the methods at hand. For their final experiment they modified selections so that either timing or amplitude would be a certain percentage of full expressiveness while the other would remain at the mechanical level. The results of this were more subtle than some others, with researchers making comments such as, "Musicians tended to give higher ratings than nonmusicians when timing was varied, but there was no significant difference between groups when amplitude was varied." Ultimately, it is suggested that while "timing and amplitude variations both affect emotionality judgements," "timing variations alone are more effective in communicating emotion than amplitude variations alone."
To muddy the waters of this question further, consider the importance of the combination of timing and amplitude; musicians tend to speed up as they play louder and slow down when they play softer (this is good to a small degree, but a tendency that musicians must often inhibit), suggesting that in order for one variable to matter, it needs the support of the other. In the authors' words, "A possible explanation for this is that the reduced sensitivity participants show when only amplitude is varied (...) also causes them to be less sensitive to the 'correctness' of the placement of the amplitude variations." To their credit, they offer a great idea for further research: "How would a piece with 25% of timing variation and 75% of amplitude variation compare with a piece with 50% variation of each?"
Even if this research had nothing to do with music, I would have to say that it is some of the best research I've read. It isn't perfect, but it does many things very well. For each experiment, they explicitly define which variables are independent and dependent; whenever necessary, they would perform pilot studies to ensure the quality of their data-collecting procedures (they even did a pilot study for how they would ask participants to rate the music and reported those findings); they do an excellent job of identifying where further research would be beneficial; and they provide elegantly structured methods and techniques that would be easily replicable. If I were a science teacher, I would use this research as a demonstration of these good qualities. All of these things should be present in all research, but I have truthfully seen recent research that does not meet these standards.