Products

Solutions

Resources

About Us

Contact

Try Online Demo

<- All posts

Mar 27, 2019

Miroslav Jirků

New Phonexia Voice Biometrics More Than Doubles its Accuracy

Phonexia releases the fourth generation of its voice biometrics technology called Phonexia Deep EmbeddingsTM. Deep EmbeddingsTM for Speaker Identification, now released in its production version, is the world’s first commercially available voice biometrics engine based exclusively on deep neural networks (DNN). The fourth generation uses deep neural networks together with more robust speaker models, leading to major improvements compared to the previous third generation.

Use of voice biometrics

Every performance improvement in voice biometrics makes the technology more usable in existing and new scenarios. Voice biometrics can be used in criminal investigations, financial services, virtual personal assistants, smart homes, IoT, automotive, industry 4.0, embedded devices (devices with no permanent connection to the Internet) and much more.

Over 99% accuracy on NIST SRE data set

The latest fourth generation of Phonexia voice biometrics technology has significantly improved its accuracy and has broken through the 99% accuracy barrier to a 0.96% Equal Error Rate, while the previous generation’s Equal Error Rate was 1.24% and was already considered as one of the most accurate on the market.

Phonexia Deep EmbeddingsTM achieved even more significant accuracy improvements when tested on multiple clients’ datasets. In almost all cases the accuracy was more than doubled compared to the previous iVector based generation. The chart below shows the accuracy improvements between Phonexia’s third and fourth generation Speaker Identification Equal Error Rate measurements.

Equal Error Rate decrease between third and fourth generations of Speaker Identification

Required Identification Time cut by one third

In some use cases a short required identification time is extremely important. The latest Phonexia voice biometrics technology lowers the required identification time from 10 to 7 seconds, and the required enrolment time from 35 to 20 seconds. In specific use cases the time can be even shorter. Besides existing use cases in criminal investigations, this fact simplifies the use of voice biometrics in a host of other segments, amongst others, speaker authentication (speaker verification by voice) or fraud detection in voice-based client verification scenarios. The other use case worth mentioning is speaker identification in communication with voice bots or in industry 4.0.

More cross-language and cross-channel independence

Thanks to the use of deep neural networks together with more robust speaker models, the new voice biometrics technology performs much better for speaker identification even in conditions where the recordings for identification are captured from a different audio channel than that used for enrolment. The same positive results are also achieved in cross-language scenarios.

Four times higher processing speed and twenty-five times lower RAM consumption

Together with higher accuracy, less required time for identification and enrolment, and better cross-channel and cross-language dependency parameters, Phonexia Deep EmbeddingsTM also brings fundamentally faster processing speed and radically lower requirements on computer RAM. While the previous generation of speaker identification would process audio recordings five times faster than real time, the new generation processes audio recordings twenty times faster than real time which makes the new system four times faster. At the same time the new system consumes just 0.08GB of system RAM while the previous one, with lower accuracy, slower processing speed etc. consumed 2.07GB of system random-accessed memory.

All the above mentioned improvements are achieved concurrently. There is no trade-off between them.

Technology Model	3rd Generation of Speaker Identification	4th Generation of Speaker Identification - Deep Embeddings
Equal Error Rate (EER) measured on NIST SRE data set	1.24%	0.96%
Equal Error Rate (EER) measured on client data set	5.61%	2.35%
Minimum speech signal for identification	10 seconds	7 seconds
Processing speed (ftRT = faster than real time)	5 ftRT	20 ftRT
RAM consumption	2.07 GB	0.08 GB

Major improvements between third and fourth generations of Phonexia Speaker Identification Deep EmbeddingsTM is currently offered in Phonexia Voice Biometrics, which is part of the Phonexia Speech Engine. For more information on this release please contact the Phonexia experts at info@phonexia.com

Stay Close to Phonexia's Innovation

Stay Close to

Phonexia's Innovation

Join our newsletter for exclusive product news, events, case studies,

and breakthroughs in voice biometrics and speech recognition.

Join our newsletter for exclusive

product news, events, case studies,

and breakthroughs in voice biometrics

and speech recognition.

By subscribing, you agree to our Privacy Policy. You can unsubscribe anytime.

By subscribing, you agree to our Privacy Policy.

You can unsubscribe anytime.

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Case Studies

White Papers

Brochures

Documentation

Blog

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

New Phonexia Voice Biometrics More Than Doubles its Accuracy

Use of voice biometrics

Over 99% accuracy on NIST SRE data set

Required Identification Time cut by one third

More cross-language and cross-channel independence

Four times higher processing speed and twenty-five times lower RAM consumption

Recent Posts

Stay Close to Phonexia's Innovation