Products

Solutions

Resources

About Us

Contact

Try Online Demo

<- All posts

Apr 23, 2019

Viktorie Petříková

How to Teach Speech Technologies to Understand Klingon – Data Collection

Did you know that speech technologies can understand any language on the planet (and possibly even in outer space)? You just have to teach them. This is why we have chosen Klingon – the artificial language made up by the filmmakers of Star Trek – to show you how nowadays sci-fi-looking artificial intelligence technologies can learn almost anything.

Artificial intelligence can be described as the ability of a computer to think and learn. But as we know, learning is not always that easy, and we still do not have Star Trek’s amazing universal translator which is able to speak with all species in the galaxy. Although we are getting very close to the level of technologies used in Star Trek and we are able to develop a high-quality speech technology easier than ever before, there are still challenges we have to overcome.

A Piece of cake? Not At All!

It is surprising how simple machine learning, which is a function of artificial intelligence with the ability to learn and improve from experience without being explicitly programmed, sometimes seems to be. It looks as simple as feeding a bunch of data into an algorithm and then suddenly there it is – world-class artificial intelligence. This might be true for many cases but definitely, it doesn‘t work like this with recognizing speech. It is much tougher because there are almost limitless challenges to overcome, such as background noise, echo, different accents, recording quality… and the list goes on. All of these variations have to be included in the training dataset to be sure that the neural network works like a champ and mimics correctly the way the human brain actually operates. I am sure you have realized when you talk in a noisy room you unknowingly raise the pitch of your voice to talk over the clamor. Now imagine how noisy it must be on a Klingon starship. Klingons, or us humans, have no issue understanding each other in such an environment, however, neural systems have to be trained to deal with this uncommon case on similar data.

It All Starts with the Right Data

When we see Star Trek’s universal translator at work, we notice how it learns a new language by listening to conversations. With the increasing amount of speech, the computer gradually learns the language and it works similarly with building a voice recognition system to understand Klingonese. To perform at a high-end level, it is necessary to acquire a lot of training data, and there is no way to skip this step.

The process of the collection of voice data starts by bringing in real Klingons – or even non-Klingon characters from Star Trek who learned to speak Klingon (as, for example, Jean-Luc Picard) – to record conversations in different environments and also different dialects and accents, which are then transcribed manually. So, the computer has an exact representation of the spoken text to learn from. By recording the conversations, we get a range of sounds in a variety of voices. From there, an acoustic model is built that represents the relationship between an audio signal and the phonemes that make up speech. To complete the learning, we need a perfect language model, for which it is necessary to have great transcriptions, vocabulary list and text. The language model provides context to know the difference between words and phrases that sound similar and, since Klingons are known for their passion for opera, these librettos, for example, could then be used as one of the text sources.

The More Real the Better

A huge part of the success is having data that is as similar as possible to the real data later used to be transcribed to text, that is why in teaching the technology to fully understand Klingon we could be using, for example, a tricorder – another Star Trek invention also used for recording. Different recording devices have, of course, different characteristics, so collecting data using similar recorders like the ones used then by our customers is key. The training dataset also has to fulfill other requirements. To have a robust model, we need to collect around 1,000 hours of speech, and ideally, collect the recordings from thousands of Klingons of various age and gender who have had a spontaneous conversation with each other.

Short lesson at the end. If you hear some Klingon saying ’uH, you know he had a hard night and he is telling you he has a hangover, so watch out.

Live long and prosper!

Stay Close to Phonexia's Innovation

Stay Close to

Phonexia's Innovation

Join our newsletter for exclusive product news, events, case studies,

and breakthroughs in voice biometrics and speech recognition.

Join our newsletter for exclusive

product news, events, case studies,

and breakthroughs in voice biometrics

and speech recognition.

By subscribing, you agree to our Privacy Policy. You can unsubscribe anytime.

By subscribing, you agree to our Privacy Policy.

You can unsubscribe anytime.

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Case Studies

White Papers

Brochures

Documentation

Blog

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

+420 511 205 265

info@phonexia.com

Chaloupkova 3002/1a

612 00 Brno

Czech Republic

Company ID: 27680258

VAT ID: CZ27680258

About Us

Our Story

Our Team

Careers

Events

Partners

Resources

Blog

Brochures

Case Studies

White Papers

Documentation

Legal

End User License Agreement

Ethical Code

Whistleblowing Policy

Ethics Line

Projects and Grants

Company Transformation

How to Teach Speech Technologies to Understand Klingon – Data Collection

A Piece of cake? Not At All!

It All Starts with the Right Data

The More Real the Better

Recent Posts

Stay Close to Phonexia's Innovation