Do you find yourself spending more and more time talking to a computer? Apple’s Siri sits on 63 million iPhones. Androids have their own recognizer apps. In the home arena, after only two years on the market, 5 million people own an Amazon Echo or Dot, which runs the Alexa recognition system. Google has Google Home and Microsoft is about to join the fray with Home Hub. These technologies allow you to speak to the computer in order to interface with the internet or with “smart objects” in your home, such as a thermostat or lights. That seamless interface is emboldening. Building those voice-activated conversations will require many different skills.
Last month, Weber State University’s College of Engineering, Applied Science & Technology hosted two experts in voice recognition and natural language processing, Ahmed Bouzid and Weiye Ma, both Ph.Ds. By the end of the visit, the college had committed to a Conversational User (or Voice) Interface Initiative to explore this growing field.
Voice recognition has been around for almost as long as the modern computer, but it has been limited. In 1952, a Bell Labs team designed “Audrey,” a machine capable of understanding spoken digits. It was another 10 years before an IBM machine could understand speech. In 1990, Dragon Dictate brought speech-to-text to consumers, but it needed to be trained for a particular speaker.
So what makes voice interfaces today so different? In the last several years, voice recognition has improved because of accelerating hardware, artificial intelligence and language processing improvements. In addition, because every voice interaction adds to the database of understanding, the systems are continually improving themselves. We have reached a moment in time where large amounts of data make seamless interaction with technology a real possibility.
Voice interfaces allow for hands-free interaction: better multitasking, better flow. Having voice interaction everywhere multiplies the opportunities. You can easily imagine the value of this while operating a moving vehicle. Everything from changing radio stations to calling home benefits from a seamless voice interaction. With smart devices you could ask if you left the oven on (which, relax, you didn’t, by the way).
There are other venues as well. You could use voice to page through a manual while fixing an engine, explain the next step in a recipe while you’re up to your elbows in flour or ask for facts while your hands remain on the keyboard and your gaze at your research paper. Visually or mobility-impaired individuals, including the elderly, can benefit. Dr. Bouzid noted that some of the elderly have even used Alexa to combat loneliness. By removing the screen as an input component of the exchange, you empower the user.
With Amazon Alexa, you can ask for the weather, news or music. In addition, a growing number of companies want Alexa to deliver their particular content as well. So, for example, you can ask the Motley Fool about stock prices, Starbucks to get your coffee ready or who are the guest stars on this week’s “Tonight Show with Jimmy Fallon.”
The start of an example conversation might go like this:
Customer: Hey, Alexa, ask Weber State Engineering for career advice.
Alexa: Hi there. Thanks for your interest in Weber State Engineering. What type of engineering interests you?
Customer: How about electrical.
Alexa: Great. Electrical engineers are some of our most sought after graduates . . .
Creating these skills requires more than programming. There is project management and programing, but also building the dialog itself. So, who understands human interaction and/or writing human dialog? People with skills in linguistics, communication, playwriting, anthropology, sociology, philosophy, literature and more.
To accommodate the growing need for skill-building, EAST has collaborated with the Ubiquitous Voice Society to create an Ogden chapter, the first in the Mountain West. Its mission is to engage in activities to make voice a meaningful and life-enhancing interface. You can find out more by going to meetup.com/uvs-ogden. In addition, an academic certificate might be a possibility in the future.
Conversational interaction with computers means career possibilities for many. Engineering and technology students: voice interfaces prove the importance of classes in the humanities and social sciences. Students and graduates of other fields: opportunities should abound in voice-oriented interfaces.
Dr. David Ferro is dean of the College of Engineering, Applied Science & Technology at Weber State University. Twitter: @DavidFerro9.