April 5, 2022
By Pavel Jiřík in Blog
As Spring is finally upon us here in Europe, it means that, besides looking forward to having more hours of daylight, Phonexia’s latest release of Speech to Text and Voice Biometrics software is now available for the next generation of cutting-edge commercial projects!
The Phonexia Spring 2022 Product Release brings many accuracy improvements, so let’s dive right into them.
Speaker Change Detection in Voice Enrollments
Our speaker identification technology for instant voice verification can verify a caller’s voice in just three seconds of free speech with over 96% accuracy out of the box.
Nevertheless, it is vital to extract a high-quality voiceprint during a voice enrollment (the person’s voiceprint that is saved into a database for future voiceprint comparisons) to offer such impressive performance.
It is especially crucial to ensure that only one person is speaking during the voice enrollment so that no other person’s voice is analyzed during the process, making it invalid.
Therefore, the latest version of Phonexia Speaker Identification now offers functionality that detects a speaker change during voice enrollment. When turned on, it can be used to alert a contact center agent when more than one voice is detected during voice enrollment and the extracted voiceprint might not meet the necessary requirements (for example, due to multiple voices being audible in the caller’s background).
Improved Voice Activity Detection
The Speech to Text performance is highly dependent on the detection of silence and speech. In challenging audio conditions, however, this can be quite hard to achieve.
Therefore, we have trained our Voice Activity Detection (VAD) on a much larger variety of data and managed to increase its ability to recognize silence and speech in noisy audio conditions with greater confidence.
Real-Time Addition of Unknown Words
All our Speech to Text (STT) languages of the sixth generation now support the functionality to add unknown words in real-time.
Custom words, such as product names, slang words, and industry-specific words, can be added to the custom STT dictionary on the fly without a need to recompile the STT model first.
It is especially useful for voicebot scenarios, where the STT dictionary can be adjusted easily during runtime, enhancing Speech to Text functionality and providing more accurate speech transcription.
Improved Preferred Phrases
In the Phonexia Fall 2021 Product Release, we have introduced the brand-new Preferred Phrases functionality that enables conversation designers to prefer specific phrases during a given speech transcription.
It is very helpful in voicebot use cases where a specific set of phrases is expected to be said by a person, such as “I lost my credit card”.
By defining a set of preferred words (phrases), Phonexia’s Speech to Text is able to achieve much higher speech transcription accuracy.
However, our latest product release takes this functionality one step further—now, you can prefer even a single word and greatly improve STT accuracy in situations where your customers are expected to say a specific word (e.g., your company's name) in their multi-worded response.
Once again, conversational AI deployments are the most-likely adepts to benefit from this enhancement the most.
Classes in the Czech Speech to Text
Czech is our first Speech to Text language that comes with the Classes functionality.
To explain it simply, the Czech STT model is enhanced to recognize Czech first names, cities, and other words (classes) specific to the Czech Republic.
Not only does it improve Czech Speech to Text accuracy, but it also simplifies work with Preferred Phrases. Instead of listing multiple variations of the preferred phrases such as “My name is Paul”, “My name is Peter”, and “My name is Charles”, you can use a related type of class instead, such as “My name is ‘first_name’”.
Once we gather enough customer feedback on the Classes functionality in the Czech STT, we will consider bringing this functionality to our other Speech to Text languages in the future.
Spanish Speech to Text
We already added Spanish Speech to Text to our previous product release via a hotfix, but this release is the first to contain this language out of the box.
As all our sixth-generation STT languages have been enhanced with a new VAD, new decoder, and preferred phrases, they offer a significant increase in accuracy.
Leveraging all the above, our Spanish STT achieved a word accuracy of up to 95%, depending on the chosen evaluation dataset.
Slovak Speech to Text
We have also updated the Slovak language from the fifth to the sixth STT generation.
The application of the latest enhancements resulted in a Slovak Speech to Text word accuracy of up to 88%, depending on a chosen evaluation dataset.
It is an impressive result, and we are looking forward to having the Slovak STT utilized for advanced voicebot and other conversational AI and speech analytics scenarios.
Farsi Speech to Text
Our sixth generation of STT languages received yet another expansion—the Farsi Speech to Text.
This language was trained using both Iranian Farsi and Afghanistan Farsi dialects, and after the update to the sixth generation, it achieved a word accuracy of up to 59% based on the chosen evaluation dataset.
As a sixth-generation Phonexia STT language, its accuracy can be improved further by using the Preferred Phrases functionality.
Turkish Speech to Text
The last-but-not-least STT language that has made it into this Spring’s product release is Turkish.
Leveraging the latest enhancements of the sixth generation of STT languages, the Turkish Speech to Text has achieved a word accuracy of 58%.
However, it was tested only on one evaluation dataset. Therefore, its accuracy could be even greater in real-world deployment, especially with the help of the Preferred Phrases functionality.
We are working relentlessly on the expansion of our most advanced, sixth generation of Speech to Text with new languages and capabilities.
For that reason, in just a few weeks, we will bring these STT languages directly to the world of private cloud and seamless API integrations. Brace yourself for an easier and seamless experience with our STT technology, which we will be announcing soon!
Plus, in the next product release—the Phonexia Fall 2022 Product Release, you can expect further STT and voice biometrics enhancements to not only help you verify customers instantly based on their voice with ease but also mitigate fraud with the latest AI-powered Phonexia technologies.