Deep Embeddings™ Acknowledged as the Most Accurate Speaker Identification System
June 28, 2019
By Jakub Bortlik in Blog
As a forensic expert, you work with a variety of advanced tools in order to provide courts and law enforcement agencies with the most accurate forensic analysis. Therefore, when using automatic speaker identification systems, their accuracy is crucial. That’s why we were excited to find out that our AI-powered speaker identification system was about to be tested by forensic experts against the other speaker recognition systems on the market.
Evaluation Process
Geoffrey Morrison and Ewald Enzinger have organized an evaluation of voice comparison systems that tries to reflect the highly variable conditions of forensic casework: “relatively poor recording conditions and mismatches in speaking style and recording conditions between the known- and questioned-speaker recordings.” They call it forensic_eval_01 and invite forensic practitioners and researches to test the systems they use, so they can not only validate the performance of those particular systems but also compare them with each other. Other evaluations, e.g., those organized by NIST, often include experimental systems that achieve extremely good results but cannot be used in real deployments. In contrast, most systems tested in the forensic_eval_01 are production systems and mature applications for forensic voice comparison.
The forensic experts from Bundeskriminalamt (the German Federal Criminal Police Office) use Phonexia Voice Biometrics in their casework, and so they need and want to validate the system. In collaboration with the forensic scientists from the Israeli police, they have put our Speaker Identification (SID) to the test of the forensic_eval_01, and they published the results in the scholarly journal Speech Communication. Members of Phonexia contributed to the paper with a detailed description of the systems that were tested.
So, How Did We Fare?
Forensic_eval_01 tested the then current XL3 model (the 3rd generation of Phonexia Speaker Identification technology) leveraging an i-vector PLDA system which is used in various voice biometrics use cases across the globe. They also tested what was then the Beta version of Phonexia Deep Embeddings™ model—the 4th generation of Phonexia Speaker Identification technology using exclusively Deep Neural Networks (DNN) for speaker identification.
While the older XL3 model has achieved good results, the impressive results of the Deep Embeddings™ model have clearly shown that the implementation of DNN technology is the way to go. It has improved in all the metrics tested by forensic_eval_01, almost doubling its performance in terms of Equal Error Rate (EER) in comparison to the XL3 model. What’s more, while the evaluation paper was in the review process, we released the production version of the Deep Embeddings™ system, which in our and our partners’ experience outperforms the already excellent Beta system, and which includes innovative and easy-to-implement ways to enhance the results.
The comparison with other systems like Nuance Forensics and BATVOX—which were tested using the same evaluation approach—shows that Phonexia Deep Embeddings™ represents the state-of-the-art system that is currently the most accurate speaker recognition system available on the market and can successfully be employed for the forensic voice comparison.
Everybody is invited not only to look up the results of the evaluation of different systems (several papers are freely available on the scholarly portal researchgate.net, just search for the keyword "forensic_eval_01") but also to evaluate the system using their own data. In this blog post, you can watch our tutorial on how to do it easily with Phonexia Browser.