Author: Cathy Pearl
Publisher: O’Reilly
Pages: 278
ISBN: 978-1491955413
Print:1491955414
Kindle: B01NALL1Q0
Audience: All voice app developers
Rating: 4
Reviewer: Lucy Black
Voice input is all the rage and you probably need to read something about it.
Amazon’s Alexa is probably the reason everyone is considering adding voice UIs to their products but if you don’t like this conclusion you can quote Cortana or Siri as alternatives. The point really is that for whatever reason voice seems to be becoming a mainstream UI. What do we need to know about it to make use of it?
At one level there is the fine details of using say the Alexa Skills API, but there is also the issue of how to design a voice interaction. This book doesn’t contain a line of code and it doesn’t tell you anything at all about particular voice input systems at any level that could be considered technical. At this point you might be expecting me to pronounce the book worthless waffle, but no it isn’t.
The simple fact is that voice UI is new and we might think it’s simple. After all we talk to each other all the time; surely there is nothing complicated here? What makes voice UI complicated is that the software side of the dialog is far from intelligent. We are a very long way from being able to interact with a computer like they do in Sci Fi. What this means is that you have to craft the interaction much more carefully then you might imagine if you don’t want to annoy your user.
This is what this book is all about.
It starts off with a history in Chapter 1 and then goes straight on to design principles in Chapter 2. If you only have time to read one chapter then this is the one to pick. It uses lots of examples to demonstrate how easy it is to get things wrong and how hard to get it right. The section on confirmation will make you think very hard about what you really need in your dialog. It made me think hard about why Alexa has to tell me that she is about to play a particular radio station and annoy me with the repetition, yet the consequence of getting it wrong is not important.
It also discusses issues such as confirmation with different degrees of certainly, error handling, being positive to the user, disambiguation and so on. There is lots of good stuff in this chapter. None of it is rocket science once it has been pointed out to you, but voice UIs are so new that even if you have been thinking about how to do it you will find clarification.
Chapter 3 moves on to avatars – graphical components to a voice UI. The whole question of whether an avatar is a good idea or not is discussed. If you do decide to implement an avatar then you will find useful advice on what not to do, choosing a voice and so on. In the what not to do section there is a put down of the famous Microsoft Office assistant “Clippy”. It is worth reading. I always knew I hated Clippy and now I know why. Oddly it wouldn’t have taken much to make Clippy far more acceptable to its users.
Chapter 4 is on speech recognition technology and here the book is a little waffly because it doesn’t use a specific example. It talks around the technologies with what is desirable and what isn’t.
Chapter 5 is called advanced voice UI and here we get into chatbot territory with considerations of how to keep a conversation going using general responses. There is also a discussion of wake words, i.e words that the user speaks to wake the device up. The final comment is:
“it is best not to choose a word that people might say commonly in conversations…”
Yes indeed.
The next chapter is on testing strageties and it is follwed by one on measuring the performance of your voice user interface and releasing it to the outside world. The final chapter is on voice devices and voice UI in cars. It is a sort of survey of what you will find out there already.
This is not a deep technical book. If you are a programmer then you will find most of the things you read here obvious – but only after you have read them! A quick read through this book, and there is quite a lot you can skip because it’s too general or doesn’t apply to your situation, could save you from creating a voice disaster. I know that most of the Alexa skills I have tried would have been better if their creators had bothered to read this book even a little bit.
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer’s Books RSS feed for each day’s new addition to Book Watch and for new reviews.