Building interface for People- Designing Voice UI



Accessibility has been a major trend within technology and its adoption. The trend is obvious; computers and technology, in general, continue to get more accessible with time. We’ve come a long way from MS-Dos to super immersive, interactive apps, making our software more user-friendly with every generation.Which raises the question: What’s next? What is the future of UX design? What is likely to make modern devices and software furthermore compelling? To anyone even remotely involved in the UX space, the answer is obvious, its voice. Interacting with a device with nothing more than just your voice can seem fascinating. Yet, it won’t be a stretch to argue that voice-based interfaces or Voice UI is are much closer to reality than science fiction. Assistants like Alexa and Siri give us a good idea of what is possible with voice; and bear in mind this is just the beginning for voice UI.

Naturally, countless businesses and UI/UX designers are drawn to voice UI. Betting on the fact that voice will be the next big thing, current players in the market are racing to be the leaders in the voice game. However, building voice UI isn’t even remotely similar to building a traditional UI. Most software, websites, and apps are primarily designed for visual navigation, while voice UI is obviously based on voice. This fundamental difference divides the two into entirely separate categories. Given its differences, voice UI has its own unique advantages and challenges. Understanding the difference between traditional UX/UI design and Voice UI can help us better understand when and where to implement either of the two.

Major differences between traditional and Voice UI

Presenting Relevant Information

The first and most obvious difference between a GUI and a VUI (Voice User Interface) is how the two systems inform their users. With GUIs, the process is fairly straightforward; everything that the user needs is available on the screen. However, with voice UI, designers cannot afford such luxuries. Consequently, VUI can come out as being severely limited in terms of the amount of information it can display. A visual screen can show a lot more information at once than what can be said in the first opening sentences of turning on a device. In contrast, a VUI-driven device feels a lot more ‘open’ in terms of what the device can understand and do. 

Although distinctly different in themselves, the process of designing information flow across both UIs can be pretty similar. The design process begins by accessing the ideal use case for a UI’s particular feature and figuring out the best way to inform the user about accessing it. Visual UIs can achieve this by designing the page layout to direct attention towards a particular section/feature. VUIs achieve this by visual or vocal prompts like the texts that read “Ask Google Assistant to set the alarm for 7:00 AM.”

Such cues are crucial, especially for voice-based UIs, since reading out an entire catalog of voice-supported features isn’t necessarily user-friendly. VUI-enabled devices need to inform and remind their users about what they can do since most of these functionalities may not be evident to the user from the start. 

Different use cases 

Speaking of what voice-based UIs can do, it’s obvious that the use of VUI needs to be smart, strategic, and specific. For instance, VUIs aren’t going to take over smartphones or mobile app development entirely; instead, they can be bundled into existing apps and programs as extra features. The point here is that VUI’s use cases can be very specific, like when users need to access their phones hands-free or interact with smart but screen-less devices. And thus, VUIs need to be designed accordingly, built not as general-purpose operating systems but rather as task and scenario-specific applications.

Designers need to be conscious of and respect the difference between the use cases of GUIs and VUIs. Voice-based systems are primarily used when users want to access a device hands-free, eye-free, and presumably with their attention diverted. This demands that most voice queries, at least the straightforward ones, be resolved in one or two steps. The application’s inability to understand the user’s voice or even a stretched-out to and fro exchange between the user and the device can be frustrating. Hence, VUI designers need to keep the interaction brief while fulfilling the user’s request. 

User flow significantly is different for both UIs

Another key difference between traditional UI and VUIs is the user flow. UX Designers, when prototyping products, imagine the path or ‘flow’ a user is likely to follow. This exercise helps designers identify and eliminate elements that could potentially ruin the product’s UX. In the case of Voice UIs though, designing and refining user flows are significantly complex. 

Traditional GUI design is an extended chain of causes & effects. If the user hits the play button, the button animates, and the speakers play the music. It’s the classic, if-then approach to software design where actions and their consequences are clear and well defined and hence more manageable. With VUIs, user flow can be extremely dynamic and even unpredictable. This is due to the fact that there are many ways of saying the same things. 

Take, for instance, a user trying to set up an alarm. At this point, the user is free to choose from a vast vocabulary; anything from ‘wake me up at 7’ to ‘set an alarm for 7 hours from now’ should work. This type of variability is impossible to predict or account for beforehand. VUIs need to be capable enough of understanding not just simple natural language but even contexts, references, and hidden or double meanings. Designing the application user flow for VUIs is thus considerably harder than traditional UIs. The latter is fast and familiar, while the former is convenient and humanized. 

User expectations differ vastly for different UIs

The biggest challenge for Voice UI adoption by far is the elevation in user expectations. Though users are aware they are talking to a machine, the internal subconscious expectation they usually have is still as if they are conversing with another human. While with GUIs, users generally have realistic expectations from years of exposure. This is partly the appeal of voice-based devices; they provide an opportunity to go beyond the screen, promoting a deeper connection with our devices. The downside is that when users don’t find the conversation as life-like as they would expect, the excitement quickly turns into frustration. 

To manage user expectations better, designers need to incorporate ways of clearly expressing what the system can do (and understand). But despite these efforts, errors and misunderstood queries are bound to occur, and how the system handles these is equally important for upholding user expectations. 

Accuracy and Error Management

Most mistakes within a GUI-enabled system come down to misclicks or user-caused errors. Yes, apps can occasionally freeze or crash, especially when they aren’t well-optimized and well-tested; but a simple tap on the ‘back arrow’ or a reboot (in extreme cases) can easily fix the issue. With VUIs though, the story is different. 

First of all, voice recognition can be extremely tricky to pull off, even for modern-day, sophisticated machine learning algorithms. High accuracy can be extremely hard to achieve, especially when the system needs to take into account not just the natural complexities of spoken language but also factors like background noises, accents, slang words, clarity, and volume, etc. So when errors do occur, VUIs need to ensure the user remains patient and doesn’t abandon the application altogether. Google Assistant, for instance, when faced with a query it cannot understand, prompts the user to either simplify it or verbalize it in a way that might be more understandable to the system. If that fails as well, the assistant prompts the user to ‘teach’ it the specific pronunciation for better accuracy next time. 

This is critical for systems with low accuracy rates since it can significantly help elevate frustrations. Bringing users onboard to help with the voice recognition itself can allow them to better appreciate how the system works and give them an understanding of what is possible for the system. Armed with this awareness, users are less likely to raise a difficult query next time, improving the system’s accuracy overall. Asking for feedback also helps keep the user engaged and allows the system to better adapt to the user’s voice, further improving accuracy. 

Building interfaces for humans, not Screens- General tips for building Voice UIs

In today’s world, where most of us suffer from excessive screen time and screen fatigue, voice-enabled devices are a welcome change. They present a convenient, screen-free, much more humanized style of interacting with our devices. And, thus are only likely to grow in demand and adoption in the near future. 

That said, here are a few tips designers could keep in mind while building VUI systems to ride the voice wave to success: 

GUI and VUI integration: 

So far, we have discussed major differences between the two, but there isn’t really any reason for them to be completely distinct and disconnected interfaces. In fact, modern smartphones already come loaded with both of them. The significant advantage of using both in conjunction is that they cover each other’s weaknesses very well. VUI, for instance, can utilize visual elements to improve the usability, accuracy, and speed of its system.  

System status updates: 

When building VUIs, keeping your users updated can go a long way when it comes to user retention. Visual prompts like moving waveforms that pop up during speech recording or vocal prompts like ‘trying to find the best route to your destination’ that pop up during the gap between input and output can be a gamechanger when implemented well. These can help users understand what is going on with the system at all times, keeping them engaged and preventing them from causing input overlap or abandoning the app too early. 

Have a fluid conversation and streamlined user flows: 

How most VUIs work is by having a trigger prompt; something like ‘Hey Alexa’ or long-press the Home button; which then leads to listening, followed by processing, and then finally output/execution. Perfecting this flow is crucial to your VUIs success. But at the same time, this flow needs a conversational execution.  Train your system to understand dynamic conversations; focus on understanding context as much as you focus on understanding words. 


Voice UI is already here and taking over our digital lives. Though VUI interactions aren’t as life-like yet as we would want them to be, some of the world’s leading UX designers are working on just that. The goal is to build devices that sound (and act) in ways indistinguishable from humans. But until that happens, at least we have Siri, Alexa, and Google assistant to keep us company.

Thanks for the submission.