October 12, 2020
By Marek Klimes in Blog
Currently, the market is full of solutions that offer voice-based automation. I am sure you have encountered virtual assistants like Siri and Cortana, or you've heard of virtual assistants helping customers of insurance companies, banks, and mobile operators. The suppliers of these solutions promise great things: “Our voicebots will handle all your calls, increase your NPS, and reduce costs.” But how smart are these voicebots in reality?
The Core Components of Voicebots
First of all, it is essential to understand what components make up a typical voicebot and how the voicebot works. Whether we're talking about Siri, Google Home, or a domain voicebot for a mobile operator, they all work on the same principles.
A key part is the Speech to Text (STT) technology, which allows the spoken word to be transcribed into text.
This text is then passed over to the natural language processing component referred to as either the Natural Language Processing (NLP) unit or the Natural Language Understanding (NLU) unit. This component is responsible for the correct evaluation of the meaning of the text. The better the NLP layer is trained, the better the voicebot understands what a person wants. In the voicebot-industry naming convention, this is called "intent detection."
After recognizing the meaning, the voicebot must reply to the person—this is done either through the Text to Speech (TTS) technology or a set of pre-recorded messages.
The voicebot solution then must be connected to an input source of voice (e.g., a PBX phone system) and a source of information about the clients (typically a CRM system).
Factors That Affect (Complicate) the Solution’s Success
Virtually every company that buys a voicebot solution will need to consider the following aspects affecting the overall success of the voicebot:
The Use Case
A vital aspect of the solution’s success is the selected use case.
If a voicebot is expected to call the clients who have just gotten their mobile Internet installed and ask them how satisfied they are with it, this is a simpler use case.
On the other hand, if a voicebot is to become the first layer in an IVR (Interactive Voice Response) system, the entire use case’s complexity and sophistication are much greater.
Even in a relatively simple use case, such as reaching out to customers and asking them about the satisfaction with the company’s products, it is necessary to expect an iterative improvement of the NLP layer. This is done by training the NLP component with relevant text data. The more use case-specific the text data used is (the sentences that customers may say), the more successful the intent detection will be.
Another element is the quality of the output from the STT transcription, as it significantly affects an NLP layer’s intent detection success rate. For example, the NLP component of a company’s voicebot clearly understands the entry "I want to speak with somebody competent". However, if the STT transcription misinterprets this sentence as "I want to sleep with somebody competent" (yes, such an error may occur), it suddenly becomes very challenging for the NLP layer to detect the right intent.
Another crucial factor is the tuning of the entire voicebot solution. As the solution is made up of multiple components (think of these components as gears), which are typically supplied by several specialized companies, effective voicebot debugging requires seamless access to the information from all those components (STT, NLP, TTS), preferably from one place. As all the parts need to run in perfect unity, it is often impossible to solve a problem (improve the voicebot) without taking the information from multiple sources (logs).
Can a Voicebot Solve 100% of All Conversations?
You may now have been asking yourself what the realistic options of the current voicebot solutions on the market are.
Besides the dependence on the complexity of the use case, it is important to note that the more time you put into training and testing a voicebot, the more robust and satisfactory the results will be.
On the other hand, it is necessary to emphasize that no current voicebot solution in the world will solve 100% of all conversations. The primary reasons are these two:
Although STT, NLP, and TTS technologies have made a big leap in recent years, even these cutting-edge technologies can sometimes make a mistake. And even though the number of errors can be vastly reduced, for example, by adjusting the way the voicebot conducts dialogues with clients, they can still occur.
The second reason for the lower success rate is caused by the actual person who is speaking to the voicebot. For the same reasons that sometimes it happens to each of us that we do not understand a person on the other side of the table (the person speaks too quietly or in a different language), it also happens to the voicebot that it does not understand a speaker.
In principle, voicebots are very similar to humans—even a trained call center operator cannot solve all problems, and, from time to time, has to consult them with a supervisor (in the case of a voicebot, with an analyst who teaches it the right answers).
The Four Most Common Myths About Voicebots
And what are the voicebot myths we encounter the most? Currently, these four are leading the board:
- You will no longer need operators. The idea of a complete replacement of human labor in the field of customer care may seem appealing to some, but it is still many years away. Voicebots are suitable for repetitive conversations, the ones that operators perform routinely and daily. However, for more specific customer issues, human operators will still be needed.
- The system "learns on its own" from the errors made in previous conversations. Often, we have come across the expectation that if a voicebot makes a mistake in a conversation, it can learn from it automatically and handle it better next time. Yes, this option exists, but it requires human intervention. The voicebot cannot handle this on its own—one has to show the voicebot what was wrong and let it "learn the lesson". It is important to say that this process can be practically endless as voicebots can learn indefinitely, just as a person learns all their life.
- The voicebot is set up once, and then it runs entirely on its own. For simple use cases, this assumption may apply to some extent. If the script is thought out well, and the voicebot is trained to provide correct answers, it can routinely perform this activity without any further assistance. However, the reality is that customers and scenarios continuously evolve, and, therefore, it is necessary to regularly finetune voicebot solutions.
- Voicebots can handle 100% of the conversations they get into. No one knows the answers to all the questions in the universe. Therefore, if a customer asks a question outside of a voicebot’s domain, and the voicebot’s creator did not anticipate it, the voicebot will be unable to resolve such conversations.
Voicebots and other types of virtual assistants are here, and their number will only increase over time. Their unquestionable usefulness and ability to automate routine operations make them very attractive for companies that want to use them to improve customer experience and save costs. The future capabilities of voicebots will only continue to grow due to the implementation of other technologies such as language recognition (it will no longer be a problem that you speak in Czech and the voicebot in English—the voicebot will simply switch to Czech) or speaker recognition (the voicebot will recognize you based on your voice). Operators will be able to engage in more valuable activities and leave the routine work to voicebots.
It is crucial to have a basic understanding of underlying voicebot principles so that you can make the most of a voicebot solution and avoid any disappointment when introducing the voicebot to your company’s processes.
The vast majority of issues that might occur during virtual assistants’ utilization can be solved quite reliably—this is why Phonexia and its partners who solve voicebots are here 😊.