Most of us do it every day without even thinking about it, yet talking is a uniquely human ability. Not only do humans have evolved brains that process and produce language and syntax, but we also can make a range of sounds and tones that we use to form hundreds of thousands of words.
To make these sounds -- and talk -- humans use the same basic apparatus that chimps have: lungs, throat, voice box, tongue and lips. But we're the ones singing opera and talking on the phone. That is because over thousands of years, humans have evolved a longer throat and smaller mouth better suited for shaping sound.
Vocal Acrobatics
Humans have flexibility in the mouth, tongue and lips that lets us form a wide range of precise sounds that chimps simply can't produce, and some have developed this complex voice instrument more than others. Take opera tenor Gran Wilson. He has toured the world singing and now teaches at the University of Maryland at College Park and at Towson University. In a split second, Wilson can go from his talking voice to full vibrato, enunciating each sound with graceful clarity as his voice fills the room.
He can do that because of his exceptional control of the Rube Goldberg-like apparatus that makes speech -- from lungs to larynx to lips. It works like this: When we talk or sing, we release controlled puffs of air from our lungs through our larynx, or voice box. The larynx is about the size of a walnut. In men, you can see it -- it's the Adam's apple. It's mostly made up of cartilage and muscle.
Stretched across the top are the vocal chords, which are two folds of mucous membrane. When we expel air from the lungs and push it through the larynx, the vocal chords vibrate, making the sound.
"The surface area of the chords that's actually vibrating is probably half of your smallest fingernail -- a very small amount of flesh buzzing," Wilson says.
The frequency of this buzzing is what gives sound the pitch. We change the pitch by tightening the vocal chords to make our voice higher and loosening them to make a lower sound.
"If you take a balloon and blow it up, you can manipulate the pitch by pulling the neck," Wilson says. The same principle applies to our vocal chords.
The vibrating air gets made into a specific sound -- like an ee or ah or tuh or puh -- by how we shape our throat, mouth, tongue and lips. Fusing these sounds together to form words and sentences is a complex dance. It requires an enormous amount of fine motor control.
"Speech, by the way, is the most complex motor activity that any person acquires -- except [for] maybe violinists or acrobats. It takes about 10 years for children to get to the adult levels," says Dr. Philip Lieberman, a professor of cognitive and linguistic science at Brown University who has studied the evolution of speech for more than five decades.
How We Got Here
Lieberman says that, looking back at human evolution, it's evident that after humans diverged from an early ape ancestor, the shape of the vocal tract changed. Over 100,000 years ago, the human mouth started getting smaller and protruding less. We developed a more flexible tongue that could be controlled more precisely, and a longer neck.
The reason the neck started getting longer, Lieberman says, is that the tongue moved down, pulling the larynx lower, requiring more room for it all in the neck. "The first time we see human skulls -- fossils -- that have everything in place is about 50,000 years ago where the neck is long enough, the mouth is short enough, that they could have had a vocal tract like us," he says.
But with these important changes came a new risk.
"The downside of this was that because you're pulling the larynx all the way down, when you eat, all the food has to go past the larynx -- and miss it -- and get into the esophagus," Lieberman says. "That's why people choke to death."
So we evolved this crazy airway that allows us to choke to death more efficiently -- all to further our ability to make more sounds and speak.
Controlled Breath
These changes didn't evolve overnight, but it's hard to pinpoint when we moved beyond primitive grunts and started talking. Fossils can only tell us so much about the shape of the vocal tract because much of it is soft tissue. But we can see what the human vocal tract shape has allowed us to do that our primate relatives can't.
"[Humans] have a number of vowels, a number of consonants. A monkey will just say 'uh, uh, uh,' " says Lieberman, mimicking a monkey's breathy vocalizations.
Not only can humans make more sounds, but we also can control how we string them together. And that is because of our amazing and precise breath control. Monkeys can't control their inhale and exhale the way we can -- they can only make short sounds a few seconds long before they have to take another breath.
But we humans can control our breath to an astonishing degree, Lieberman says.
"One of the interesting things about speech -- and singing -- is we go through a very complicated process so that we have an even air pressure in our lungs," Lieberman says. Our lungs are like a set of balloons, he says. Except, unlike a balloon, which gets very low pressure when it's nearly deflated, we can control how quickly -- or slowly -- our lungs release air.
"When we talk, we first guess the length of the sentence we are going to produce," Lieberman says. "This is quite amazing -- and we hold back on the lungs with the muscles. And they have this complicated function where as the lungs deflate, you hold back less and less and less. So you end up with a more or less even air pressure."
If we didn't do that, the pitch would rapidly descend as we got to the end of the lung balloon, and we'd blow our vocal chords apart with high pressure.
Copyright 2022 NPR. To see more, visit https://www.npr.org.