The Future of Voice User Interfaces (VUIs)
From birth we use language as our primary communication method and throughout life we hone this skill. We are adept at speech and language; however, we input data and requests into technology via typing into a screen or using a keyboard.
It’s easy to see why so many of us are eager to communicate to computers and technology in the way that we are most comfortable; speech
While voice technology has come a long way in the last few years, we are still far from our dreams of the perfect Voice User Interface. Portrayed in science fiction like the ship’s computer of Star Trek or 2001’s HAL, we dream of an AI that can understand and converse with us on a human level.
Personal assistants today still lack the basic conversational foundations that create a smooth personalised experience. We use wake words, commands and strange uses of vocabulary that would be alien if used within a real human conversation.
It’s obvious that talking to a Voice User Interface (VUI) is still not quite like talking to a human. Many may argue that this isn’t necessary, and that being able to bark fragmented command words at a disembodied voice gets the job done. For now maybe that’s all we need, but let’s think about the many future improvements that could be made to voice technology.
Here are a few of the most anticipated features that will take this technology to the next level.
The subject of context is one of the most talked about within the VUI community. In the future we can expect devices to be able to hold context over much longer periods of time than they can currently.
Google has the ability to keep context for a few commands. i.e. –“Who is Brittany Spears?”, “Who is her mother?”, “Where was she born?” etc. This works well with the new Continued Conversation feature that Google has recently rolled out across the US (and soon UK).
Continued Conversation allows the device to continuing listening after it relays information, so that it can capture any follow up questions from the user. This means users don’t need to say wake words at the beginning of every sentence, much like normal conversations. Since this feature relies on multi-sentenced conversations, hopefully we will see an increase in Google’s ability to hold onto context for longer periods of time.
A device that could hold memory of previous interactions could help it understand the user’s future requests.
An example of this would be:
“How long will it take me to get to Gatwick airport?"
“In current traffic, it would take 55 minutes.”
A few hours later...
“Book me a taxi.”
“Sure. Would you like me to book you a taxi to Gatwick airport or somewhere else?”
This small use of context helps the user save time by using the past in its contextual memory.
Although context is a highly anticipated feature it needs to be used intuitively. If a user was asking about Gatwick airport for a friend, it’s not going to be relevant to them later on. VUIs need to be able to understand when context is helpful to the user and when it would be a barrier to interactions. This could be done by cross-referencing calendar events or by learning from user data about previous examples of interactions.
This works in conjunction with context. Awareness of the devices current situation, location and recent interactions forms the illusion of awareness. With this we are able to build trust with Personal Assistants since we know that they will respond appropriately in any situation.
If the device knows that the user is at home rather than at work, a location search can be more relevant. If the device knows that the user has been looking at maps and directions to London, chances are that they want to go to London. If it can cross-reference that against the user’s calendar, it can verify the date that they are visiting.
In the near future your smart device could give you everything you needed for the day based on your past requests and calendar information. i.e. – when to leave the house, which train to get, which restaurants to try, when to leave for home etc.
Alexa recently gained the ability to understand the room that it is in. You can now ask it to turn off the lights without being specific about which lights you want turned off.
This is the very starting point for device awareness. Potentially, in the future, a device could monitor noise levels and respond at an appropriate volume, or change lighting depending on current light levels in the room.
Feeling that our Personal Assistants are in tune with us and our personalities is going to increase the trust that be bestow upon them. The way that humans interact with each other can be influenced in many ways. We manipulate our conversations, our words and even intonation based on the person we are speaking to.
We talk to children with simpler words and smaller sentences so that they can understand us. We have already seen a move towards this with the Google Home’s Pretty Please, a feature that can be enabled that encourages children to be polite when taking to Google. The ability for a VUI to do this automatically would break down the barriers of language even further, making sure that the user and the VUI are “speaking the same language”.
VUIs can sometimes interpret a momentary pause in speech as a cue to start answering a question. Not only is this frustrating for the user but the assistant won’t be able to surface the correct information from an unfinished sentence. By learning speech patterns, devices will be able to understand when a user has paused or finished their request.
This could also go a step further and communicate differently from user to user based on their personality, mood and age. All of this data can be gathered from today’s existing user base to customise responses depending on user type.
Some VUIs can handle multiple commands at once. Earlier this year Google started rolling out support for multiple commands to Google Home. It can now support up to three multiple requests in the same sentence.
"Hey Google, give me the weather, the news and then play some music."
Google is currently able to understand basic commands strung together in one sentence. However, it struggles with complex commands and multi-clause sentences.
Even if VUIs are able to handle complex questions, a pitfall is that they need to be asked in a very specific way. If the user cannot guess the correct way in which to ask the question, they will generally give up instead of battling with the interface.
Alexa can set weekday alarms, however, if I ask it “Set alarms Monday to Friday at 7AM.” I get a response that it can’t do that. If I rephrase this to “Set a recurring alarm for weekdays at 7am.” It processes my request. Users should not have to waste time thinking of how to phrase a question in order to be understood.
The danger of this is that our expectations of smart speaks are being lowered every time we hit a wall. If a Personal Assistant is unable to process my request there is not much chance I’ll check to see if an update has fixed it a week later. With this constant cycle of experimentation and disappointment, we begin to expect less from smart devices and stop exploring new commands as we presume to be met with a chipper “Sorry, I don’t understand”.
In the future we can expect that Personal Assistants will be able to understand long or complex questions phrased in a multitude of ways.
Throughout all of these points we have touched on the idea of Personal Assistants being able to learn patterns in user behaviour. Once VUIs understand how we want to be served our information, it will be much more equipped to be able to handle our requests in an appropriate way.
There are millions of smart speaker devices in people’s homes, which could be used to generate patterns that will help inform VUIs. Using machine-learning technology smart speakers can start making sense of this data. It could be used to learn individual or nationwide patterns to enrich the user’s experience.
We are already on the back foot when it comes to voice design as users’ expectations have already been lowered. Because of this we might have to wait for future generations to start using this technology natively to reap the benefits. Since the next generation won’t have witnessed the amount of confusion, command failures and oddly structured sentences, voice interfaces might feel more natural and more fluid to them.
We are clearly still in the discovery and learning phase of VUIs and Personal Assistants and these are common teething problems when it comes to new technology. However I believe we should remain cautiously optimistic about the future of voice. VUIs solve problems, make inputting data fast and have potential to make our lives easier in a multitude of different ways.
As VUIs mature and become more developed we can expect many of these anticipated features to make an appearance, and for conversations with VUIs to feel more natural. As UX designers we have the power to influence this new technology in its infancy and to create a fully user-centric Voice User Interface.