January 8, 2021
By Pavel Jirik in Blog
The Oriental Language Recognition 2020 Challenge
Phonexia’s team of speech recognition experts attended the Oriental Language Recognition 2020 Challenge co-organized by Xiamen University, Tsinghua University, Duke-Kunshan University, Northwestern Polytechnical University, and Speechocean.
The Oriental Language Recognition (OLR) Challenge series is focused on the improvement of language recognition technology for oriental languages. Oriental languages have a unique set of characteristics that make them harder to recognize (this applies even more to oriental dialect recognition).
In total, 58 teams from all around the world registered for the challenge.
Two Multilingual Databases
Each team received two multilingual databases to create their language recognition solutions:
- The first database included 71 hours of speech recorded through a mobile channel, seven languages, and all necessary transcriptions and lexica.
- The second database included 35 hours of speech recorded through a mobile channel, three languages (Tibetan, Uyghur, and Kazak), and all necessary transcriptions and lexica.
Three Identification Challenges
The OLR 2020 Challenge required every competing team to complete the following three language and dialect identification tasks:
- Cross-channel oriental language identification
- Oriental dialect identification
- Oriental language identification in a noisy environment
Cross-channel Oriental Language Identification
This task required the identification of six languages in a test set (provided by the OLR 2020 Challenge) that contained speech recordings recorded in various environments using different recording equipment.
Oriental Dialect Identification
This second task required the identification of three different dialects (Hokkien, Sichuanese, and Shanghainese) in a test set provided by the OLR 2020 Challenge.
Oriental Language Identification in a Noisy Environment
The final third task was designed to test language identification technology in a noisy environment. It required the identification of five languages in a test set provided by the OLR 2020 Challenge containing noisy speech recordings.
Phonexia Team Won the Oriental Dialect Identification Task!
Our research team excelled in all three tasks, but they performed the best in the oriental dialect identification task in which they achieved 1st place!
Out of all competing teams, the Phonexia team delivered the most accurate oriental dialect identification solution with an 11.97% equal error rate (EER) and 0.0738 average cost (Cavg).
Following this great success, the team also achieved 5th place in the noisy language identification task, and 8th place in the cross-channel language identification task.
Feel free to check out the complete results on the OLR 2020 Challenge page.