February 11, 2022
By Pavel Jiřík in Blog
Voice biometrics technology has made tremendous progress in the last few years. Now that practically all industries have started to adopt voice identification, we soon might not need to remember a password or customer number anymore when calling a bank or support center. One would only need to speak with their Interactive Voice Response (IVR) or a support agent, which could immediately identify the caller. How? By comparing the speaker's voice to voiceprints stored in the organization's database.
What's more, this is also a more secure way of verifying someone's identity. As every human's voice is unique, their identity can be checked seamlessly and with extreme accuracy while they are naturally talking to an agent or IVR.
So is it really that surprising that the demand for voice biometrics technology is continually growing? The industry was worth around $1 bn in 2020 but is expected to reach a value of almost $6 bn by 2027.
But voice biometrics technology isn't only useful for quick and smooth authentication processes. It can also be a valuable source of information about a person, as our voices contain many useful and sometimes surprising facts.
In this article, we'll examine in detail what unique identity and demographic information companies could discover just by analyzing the voices of their callers.
What Surprising Facts About Us Could Companies Learn From Analyzing Our Voices?
A Person's Identity
Many businesses have become interested in voice biometrics mainly because it can help them quickly and securely verify the identity of their customers. How does that process work without this technology? Agents must ask for the caller's number, login information, or password to find out who they are. People rarely remember all of those details though, which usually leads to either:
- the caller frantically searching for their login information, or
- the support agent asking a series of questions to make sure they’re talking to the person who the caller claims to be.
The verification process can get especially lengthy if the agent has to ask for sensitive information. But when so many fraudsters are trying to impersonate customers nowadays, companies simply can't ignore such security measures.
Using voice biometrics as an identification method both speeds up the verification process and makes it much easier for the caller, who doesn't have to remember any account information. Instead, they simply have to say a given phrase to the agent or IVR system in order to be immediately recognized.
Passive voice biometric identification can make the identification process even faster, since it can identify the caller while they are speaking naturally to an agent or the virtual assistant. In this case, recognition can take place without the customer having to repeat a certain phrase.
Voice biometrics technology is also highly secure as a verification method, as each voiceprint is stored as a unique mathematical pattern that can't be reversed back to the original voice sample. This means that using voice identification may be one of the best ways for companies to further protect themselves (and their data) from a fraudulent attack while making it more convenient for callers to get assistance.
A Person’s Face
The idea of being able to identify a caller just by analyzing their voice sounds like something out of science fiction. How about being able to predict how a person's face might look just from their speech, then?
The Computer Science and Artificial Intelligence Laboratory at MIT has developed artificial intelligence that can reconstruct people's faces (Speech2Face) using only audio clips of their voices for reference. Researchers at MIT wanted to find out how much information about a speaker's face can be learned from their voice, so they created an AI network and fed it with millions of video sequences to help it understand the relationship between those two features. Then, they asked it to create a portrait of several people based on the recordings of their voices.
As they compared the AI-created photographs of these people with real photos, they were shocked at how close these were to real photos. From voice recordings of only 6 seconds in length, the AI could correctly describe the speaker's ethnicity and even visualize their facial features.
Obviously, facial recognition technology is still far from perfect. Some of the generated portraits from those experiments do have an eerie resemblance to real people, but others do not seem quite right. The results are already quite impressive though, so facial recognition technology powered by voice biometrics could become widely adopted within a few years.
A Person’s Gender
Now for something a little bit different. Can voice biometrics systems powered by Artificial intelligence recognize the gender of a speaker? Yes, indeed they can.
Research on "Gender Determination Using Voice Data" sought to determine how accurate AI can be at distinguishing men from women based on recordings of their speech. The results were that the technology could identify male and female voices with an accuracy of 97.9%, but how?
Depending on the physical characteristics of the vocal cords such as their length and thickness, a person's voice will have a specific frequency. The thicker and longer the vocal cord, the lower the frequency a person's speech will be. Women's voices typically have a voice frequency around 210 Hz, while for men it is around 120 Hz. By analyzing the frequency, pitch, and many other traits that differ based on gender, AI can identify whether a speaker is male or female.
Nevertheless, even the best AI systems used for gender recognition can still make mistakes because their accuracy relies heavily on the data input to the system. For example, AI is known to struggle with recognizing voices that fall into a wide range of gender-neutral frequency and pitch ranges. But since research is constantly progressing, even such cases will soon become easier to distinguish.
A Person’s Height
Can you guess someone's height based on their voice? Maybe not to the extent of pinpointing it exactly, but yes. According to a study by the Acoustical Society of America, we can tell if someone is short or tall just by listening to them speak.
During the research, participants were asked to listen to the recorded voices of two people saying a few random words and then guess which one was taller. Perhaps surprisingly, participants could determine their heights with ease and were even able to place a group of five people in the correct order.
This led scientists to conclude that there must be something (outside of the belief "tall people have deeper voices") that gives people clues about how tall a speaker is. And they did indeed discover the reason, which is a sound called subglottal resonance that deepens with height and can help people gauge someone's height by listening to their voice.
John Morton, a psychologist at Washington University, described this discovery by "The best way to think about subglottal resonances is to imagine blowing into a glass bottle partially full with liquid: the less liquid in the bottle, the lower the sound. The frequency of the subglottal resonance differs depending on the height of the person generating it, with resonances becoming progressively lower as height increases."
A Person’s Age
People are also quite good at determining someone's age just by listening to their voice. As part of a 2010 study detailed in the Journal of Social, Evolutionary, and Cultural Psychology, researchers asked 97 people to listen to 100 samples of speakers ranging from 2 to 67 years old and then guess their ages.
When asked to identify children, teenagers, and older people's voices, the participants didn't have much trouble. With speakers between 45 and 65 years old though, the accuracy dropped as the guesses were at least 10 years off the mark. Considering how people speak as adults and teens, the results that people may have a hard time guessing the age of an adult may not be surprising.
AI, on the other hand, can estimate an individual's age by analyzing the changes in the vocal cords as we age. How could this be useful? For example, businesses would be able to quickly identify senior citizens based on how they speak and then move them into a priority queue. What's more, it would also be possible to help prevent all types of attempted fraud when someone tries to imitate an elderly person. If a bank employee or support agent notices that the speaker's voice doesn't match the data they have, they could ask for more information to verify the caller or simply contact the original owner of the number.
This last aspect might sound like something out of the Black Mirror universe, but it is already developing. In 2018, Amazon patented a new feature for Amazon's Alexa through which it could recognize signs of illnesses and respond accordingly. In the example they show, a woman is coughing and sniffling as she gives commands to her Amazon Echo device, so Alexa responds by first suggesting a chicken soup for her cold and then asking if she should order cough drops on Amazon.
Amazon's patent also suggests that the Alexa of the future might be able to detect emotional states from users like joy, anger, sadness, boredom, fear, and happiness. In combination with IoT wearables, virtual assistants could gain the potential to make it easier for doctors and hospitals to monitor a patient's mental, emotional, and physical health so they can alert the right people in case of abnormalities. But since many users may find this invasive, it is hard to predict if and when this feature will be available with Alexa.
People can already determine a lot about a speaker just based on the tone, accent, and pitch of their voice. What if we add AI and voice biometrics to the mix?
Using these technologies, businesses can quickly and efficiently identify who they are talking to and know how to tailor the conversation to match. Even better is that this benefits callers as well, who have to spend less time answering verification questions while being sure that their customer account is fully secure.
What a fantastic thing our voice is, right?