Every geek worth his nerdiness has wanted to be able to converse with their computer since we saw Star Trek’s Data Captain Kirk talk to the Enterprise’s computer back in the 1960s [my mistake regarding the correct character was pointed about with amazing swiftness by reader Mark Wing]. For most of the time since then, having a real conversation with a computer has seemed something that was really, really far away. Recently, it’s got a lot closer …
Just over a year ago I reviewed Amazon’s Echo which I judged to be amazing and I still think it’s amazing although even though the technology is still in its early days. The problem is that the Echo isn’t really conversational as it’s limited to a basic request-response model (though its occasional weird weird non-sequiturs are hysterical and TV ads from Amazon of course get hilarious responses). That said, the Echo, which uses the Alexa Voice Service, remains a compelling, useful product and since I wrote about it, Alexa’s abilities have grown rapidly. Alexa now understands a much greater range of ways to make a request, can deliver information on a wider range of topics, and has an API that has matured and expanded impressively. Here’s how the Echo works: On the backend, there’s the Alexa Skills Kit (ASK) which is:
… a collection of self-service APIs, tools, documentation and code samples that make it fast and easy for you to add skills to Alexa. All of the code runs in the cloud — nothing is installed on any user device. With the Alexa Skills Kit, you can create compelling, hands-free voice experiences by adding your own skills to Alexa. Customers can access these new skills on any Alexa-enabled device simply by asking Alexa a question or making a command.
The Alexa Skills Kit is how you add to the list of things Alexa can do for you and, to get Alexa to do them, there’s the Alexa Voice Service (AVS):
… an intelligent and scalable cloud service that adds voice-enabled experiences to any connected product – all you need is a microphone and speaker. Users can simply talk to their Alexa-enabled products to play music, answer questions, get news and local information, control smart home products and more. And with the free Amazon Alexa app, users can easily control and manage their products from anywhere.
So, say you want to control lighting in a laboratory without having to touch anything or perhaps report on the status of your network. You can link Alexa Voice Service via the Skills Kit to the functionality you need. But perhaps, for some reason, you don’t want to use an Amazon Echo or any of its sibling products so you’re wondering what it takes to build a device that can use the Alexa Voice Service and Alexa Skills Kit; the answer is not that much.
In May, 2015, the Alexa Voice Service Team released the Raspberry Pi + Alexa Voice Service project which, as you might guess, implements AVS on a Raspberry Pi (an RPi 2 or better is suggested although an RPi 1 can be used) and uses a Java client and Node.js server. Here’s the hardware the project used:
Project: Raspberry Pi + Alexa Voice Service
That’s a Raspberry Pi 2 model B with a Kinobo USB 2.0 mini microphone and an external speaker. The RPi is running Raspbian “Jessie” installed via NOOBS and the project narrative goes into fine detail on how to get everything up and running; in fact, this is one of the most comprehensive “how to”’s I’ve seen for a Raspberry Pi project because it even includes how to install Raspbian.
Now, until recently, there was one big difference between the Amazon Echo family products and do-it-yourself AVS implementations like this and that was that Amazon’s terms and conditions didn’t allow for voice-activated triggering of AVS. If you’ve ever played wi-, er, tested the Echo you’ll know that all you have to do is say “Alexa” and whatever you say next will be sent to AVS where it will be misunderstood as a request for the spot price of pork bellies in Borneo interpreted and a response returned. Without voice activation, you have to press a switch to trigger “listening” which reduced the value of these home grown systems immensely. That all changed in the middle of October this year when Amazon changed their policy and allowed the use of “wake word” engines.
The two wake word engines currently available are TrulyHandsFree from Sensory and Snowboy from KITT.AI (the latter refers to wake words as “hotwords”). A new Amazon project on GitHub covers everything you need to do to build working prototype systems on Linux, Mac, Windows, and the Raspberry Pi. For the latter, the RPi 3 is now recommended and the RPi 2 is still supported.
In the new project, the wake word is still “Alexa” for TrulyHandsFree but with Snowboy, you can define any wake word you please; many people have already evilly suggested using “OK Google or “Hey Siri” for a wake word.
Note that the new AVS project license is for prototyping only; if you plan to build a commercial product you’ll need to get commercial licenses from both Amazon and the owner of whichever wake word engine you select.
Voice-enabled services that recognize voice input and respond intelligently was a dream but with wake word engines that efficiently run on devices as humble as the Raspberry Pi and back-ended by sophisticated intelligent systems such as Alexa Voice Services and Alexa Skills Kit, we’re getting a lot closer to real, practically useful, voice-driven products.
It looks like the Enterprise’s computer voice interface may be just around the corner. If you’ve got an Echo or if you build an RPi AVS system, try asking “Alexa, beam me up.” One day, it might actually work …
Comments? Thoughts? Drop me a line or comment below then follow me on Twitter and Facebook.