How Voice User Interface is taking over the world, and why you should care

Daniel Requejo

6 April 2018

180406 gr rebelthinking vuie wp 1

User interfaces, or UI, are what allow us to interact with machines. They encompass everything, from those things we tend to take for granted, like keyboards and the screens of our desktop computers, to technologies that are more complex, like the movement based UI the Xbox Kinect is built upon. As new technologies are introduced, their adoption rate is entirely dependent on the development of efficient, human-centric UI design. captura de pantalla 2018 04 05 a las 20 33 19

Different types of User Interface

Voice-user interface, or VUI, has exploded in popularity over recent years. VUI uses speech recognition technology to enable users to interact with technology using just their voices. Virtual assistants like Siri and Alexa have brought VUI into the mainstream, with corporate giants like Google and Sonos following their lead. Companies like Synqq and Nexmo have also taken advantage of VUI technologies in order to develop devices that allow for real-time translation and transcription. However, it’s virtual assistants which have really captured the corporate imagination.

VUI allows for hands free, efficient interactions that are more ‘human’ in nature than any other form of user interface. “Speech is the fundamental means of human communication,” writes Clifford Nass, Stanford researcher and co-author of Wired for Speech, “…all cultures persuade, inform and build relationships primarily through speech.” In order to create VUI systems that work, developers need to fully understand the intricacies of human communication. Consumers expect a certain level of fluency in human idiosyncrasies, as well as a more conversational tone from the bots and virtual assistants they’re interacting with on a near-daily basis.

We’re not in Westworld just yet but it’s clear that robotic assistants are here to stay. With that in mind, it’s important to understand all the potential pitfalls and positive opportunities that come along with this newly popular technology; so let’s explore the good, the bad, and the downright ugly side of VUI.


The Good

In order to create good VUI, brands need to understand their consumers, what they want from a virtual assistant and, more importantly, what aspects of interacting with Artificial Intelligence (AI) drive them to the absolute brink. There a number of benefits to a VUI that other user interfaces cannot provide, namely:

  • Personality and tone – with voice-based virtual assistants there is more opportunity for brands to inject a little personality and humour. Ask Siri to beatbox for you and she’ll do just that, call her by the wrong name and she’ll return “Very funny. I mean, not funny ‘ha-ha,’ but funny.” Google Home is totally au fait with pop culture references, from Star Trek to Sir Mix A Lot, as is Amazon’s Alexa. A more personable tone helps users to forgive those moments when virtual assistants are unable to complete tasks or answer questions that an actual human would have no problem with. A personable VUI also helps to increase brand affinity – you’re more likely to use a particular device or service if it’s more entertaining and ultimately more ‘human’ in nature.
  • Efficiency and convenience – VUI requires nothing other than a vocal command to carry out tasks or answer questions. No longer will amateur chefs be forced to scrub up to set a timer lest they smudge the screens of their very expensive smartphones. Now they can just ask Alexa and she’ll set it for them. Users can quickly check the weather forecast on their way out of the house, add an item to their grocery list without scrounging around for a pen, or skip a song on Spotify without lifting a finger. VUIs are more likely to exist within devices that are online and connected all day long, devices which may one day prove integral to our daily lives.
  • A more ‘human’ experience – accurate and efficient speech recognition software allows for a more a ‘human’ kind of conversation than can be had using any other device. We shouldn’t underestimate the value of human interaction; if you’ve ever had a long and tedious phone conversation with an automated customer care centre then you know it’s not always easy to get VUI right, or any kind of conversational user interface for that matter. However, with advancements in machine learning and natural language processing, interactions with brands and devices through a VUI are rapidly becoming more ‘human’ and less robotic. Implementation of a VUI-based technology demonstrates real commitment to a culture of human centricity.


The Bad

As discussed, the implementation of VUI is not without its roadblocks. Problems that arise during the conceptualisation and design process are often the result of an insufficient understanding of human psychology. In order to prevent issues around adoption and consumer frustration with VUI based devices, we should consider the following:  

  • Discovery and retention – while Amazon has made it very easy for third party developers to come up with their own skills for the Amazon Echo, only 31% of those 7,000+ skills have more than one review, an indication of low usage. This issue is not unique to Amazon. In order to increase the rate of adoption, developers need to convey to users what they can and can’t do from the very start, while working all the time to ‘humanise’ the VUI systems that these virtual assistants are built on.
  • Understanding limitations when a machine and a human are engaged in conversation, we need to adapt the way in which we communicate – humans aren’t used to following strict, unwavering linguistic law, especially when it comes to speech. If users understand from the very start the ways in which their device is limited, they’re less likely to feel disappointed when their assistant fails to complete a task or answer what might seem like a very simple question.
  • Natural Language Processing – we’re not currently capable of developing a VUI with an inbuilt, natural and complex understanding of human communication, not yet. Regional accents, slang, conversational nuance, sarcasm… some humans struggle with these aspects of communication, so at this point can we really expect much more from a machine?
  • Visual feedback – including an element of visual feedback helps to reduce the level of frustration and confusion in users who aren’t sure whether or not the device is listening to or understanding what they’re saying. Alexa’s blue light ring, for example, visually communicates the device’s current status e.g. when the device is connecting to the WIFI network, whether or not ‘do not disturb mode’ has been activated, and when Alexa is getting ready to respond to a question…etc.


The Ugly

In recent months, user privacy has become an even more contentious issue; following the Cambridge Analytica scandal, accusations that devices like Google Home and Amazon’s Alexa might be listening into private conversations and in the run up to the introduction of GDPR. Consumers are beginning to ask themselves; what’s being recorded, what’s being stored, and how is my private data being used by corporations? As privacy concerns continue to grow, trust in virtual assistants and in the IoT in general is lessening. In order to regain trust, developers and tech manufacturers must find ways to reassure their consumer base that their privacy is of the utmost priority, while at same time trusting that in time consumers will become more comfortable with the technology.


The Future of VUI

The aim of systems based on VUI is to provide users with a fully immersive experience; nuanced, complex and more human in nature. We’re not there yet, but advancements in technology which allow us to develop more complex algorithms and software more apt to simulate human behaviours, have opened up more opportunities for growth, both in the home and in the workplace.

  • Business – in late 2017 Amazon announced Alexa for Business, designed to help employees manage their schedules, keep track of their to-do list, dial into conference calls, and make voice calls on their behalf. Alexa for Business allows meeting attendees to control the equipment in their conference room using just their voice, notify IT about a broken printer, and recall the latest sales data or inventory levels. Within the working environment, the virtual assistant becomes the virtual secretary.
  • Smart Houses and the IoT – Most virtual assistants fall into the category of smart home devices. The more we focus on compatibility between smart home devices, the closer we get to a completely interconnected household. With Google Home, users control every Google device in their home via their Android Smartphone. Google Assistant can control more than 1,000 smart home products including kettles, microwaves, robotic vacuums and thermostats. With Apple’s HomePod users can even use a catchall phrase like “Good Morning” to turn on multiple smart home devices at once.


Find your voice

The main barrier preventing widespread implementation and acceptance of VUI systems is the fact that developers are forced to adapt the rules of communication in order to accommodate the limitations of the device. For users, these limitations can make interacting with these devices pretty tedious. Often there are so many different, possible answers to a query, virtual assistants find themselves in an uncomfortable loop listing endless options for Italian restaurants or breeds of dog. Many developers have attempted to remedy this by limiting the initial number of options presented to three, and then asking the user if they want to hear more. To improve the quality of the user experience, we need to develop machines capable of comprehending context, tone of voice, and attitude, with a better understanding of user intention based on historical data and the observation of previous patterns of behaviour. We need to go beyond a pre-programmed script.

“Interfaces to digital systems of the future will no longer be machine driven. They will be human centric,” explains Werner Vogels, Amazon’s chief technology officer, “We can build human natural interfaces to digital systems and with that a whole environment will become active.”