Introducing the Fourth Generation of Phonexia Speech Platform

Introducing the Fourth Generation of Phonexia Speech Platform

May 22, 2024

By in Blog

We are pleased to announce the fourth generation of the Phonexia Speech Platform, featuring numerous enhancements and innovative features designed to advance voice biometrics and speech recognition to a highly efficient level.

The latest generation offers an ultimate package of an easy-to-use graphical interface, high-performance technologies, and seamless integration capabilities.

Intuitive Graphical User Interface

Phonexia Speech Platform 4 comes with a brand-new graphical user interface (GUI) that makes it extremely easy for users to utilize Phonexia's cutting-edge technologies with just a few clicks from the convenience of their web browser.

This intuitive GUI is currently available for our flagship technologies, Speaker Identification and Speech to Text, with more technologies to follow.

For example, searching for a person's voice in a list of unknown audio recordings is now a matter of just a few clicks:

Feel free to try it yourself in our online demo.

High-Performance Microservices

When processing vast amounts of audio data, efficiency is crucial.

The fourth generation of the Phonexia Speech Platform now offers, alongside a standard REST API, a high-performance gRPC API designed to handle extensive audio data volumes while enabling effortless scalability.

Available as a Docker container, these easy-to-scale microservices mark a new era in efficiently managing large-scale audio processing with Phonexia technologies.

Currently, the Docker-based microservices are available for our Speaker Identification, Speaker Diarization, Gender Identification, Speech to Text, and Language Identification technologies, with more to come.

Versatile Virtual Appliance

The Phonexia Speech Platform's fourth generation is available as a virtual appliance, ensuring seamless deployment in virtualized environments.

It includes the intuitive GUI and REST API, making the utilization of Phonexia technologies effortless for both non-technical users and technical integrators.

Whether you need to identify a speaker in just a few clicks or integrate voice biometrics and speech recognition technologies into complex on-premises or cloud solutions, the virtual appliance makes it a breeze.

Enhanced Speech to Text Built on Whisper

Our R&D team has enhanced Open AI's Whisper model with our proprietary speech processing technologies and expertise, making its speech-to-text performance more robust, accurate, and, more importantly, much faster.

The Phonexia Speech Platform's latest generation enables you to transcribe speech using the Enhanced Speech to Text Built on Whisper, capable of automatically detecting the language of conversation and converting speech to text in more than 50 languages.

Want to try it yourself? Check out our free online demo.

GPU Processing Support

In addition to CPU processing, Phonexia Speech Platform 4 supports GPU processing, significantly boosting processing speed, especially for the Enhanced Speech to Text Built on Whisper.

New Generation of Speaker Diarization

Phonexia Speech Platform 4 also comes with Phonexia's latest generation of Speaker Diarization technology, which can process large audio recordings much faster than the previous generation.

This is particularly useful for lengthy mono-channel audio recordings, enabling quick labeling, segmentation, and separation of speakers based on voice.

The technology is also more accurate and robust across different audio channels, and it supports GPU processing for even faster results.

New Generation of Language Identification

Last but not least, the latest Phonexia Speech Platform comes with our newest generation of Language Identification technology, which can recognize 140 languages—nearly doubling its language identification capability compared to the previous generation, which could identify 83 languages.

Additionally, it now also supports GPU processing.

What's Next?

With the rise of AI, generative abilities, and security challenges from realistic audio- and video-based frauds, our R&D team is currently developing advanced deepfake detection technologies that will help distinguish real audio from artificially generated deepfakes.

Furthermore, we plan to expand the Phonexia Speech Platform's intuitive GUI to include Language Identification and Speaker Diarization technologies, introduce a translation technology, and incorporate many other improvements.

Check It Out for Yourself

Want to test it yourself? Try our free demos for Speaker Identification or Speech to Text.

Would you prefer a live demo? Let's schedule one!

Share now!

Recent posts