This post originally appeared on Backchannel.
In the Game of Thrones-like artificial intelligence competition between Houses Amazon, Apple, Facebook, Google, and Microsoft, the company most reticent to speak about its technology has usually been the one that ships planeloads full of stuff to consumers, hosts thousands of companies in its data centers, greenlit Catastrophe, and has a breakaway hit product that answers questions, plays music, and 4,998 or so other things.
Yes, for some time, Amazon has been even more shrouded than the famously secret Apple, which opened up about its machine learning programs earlier this year.
Lately, however, Amazon’s head scientist and vice president of Alexa, Rohit Prasad, has been speaking up in public, making the case for his company’s prowess in voice recognition and natural language understanding.
Alexa, of course, is the conversational platform that supports that aforementioned hit product, Echo.
On Wednesday Prasad gave an Alexa “State of the Union” address at the Amazon Web Services conference in Las Vegas, announcing an improved version of the Alexa Skills Kit, which helps developers create the equivalent of apps for the platform; a beefed-up Alexa Voice Service, which will make it easier to transform third-party devices like refrigerators and cars into Alexa bots; a partnership with Intel; and the Alexa Accelerator that, with the startup incubator Techstars, will run a 13-week program to help newcomers build Alexa skills.
Prasad and Amazon haven’t revealed sales numbers, but industry experts have estimated that Amazon has sold over five million Echo devices so far.
Prasad, who joined Amazon in 2013, spent some time with Backchannel before his talk to illuminate the direction of Alexa and discuss how he’s recruiting for Jeff Bezos’s arsenal without drying up the AI pipeline. This interview has been edited for length and clarity.
Steven Levy: You’re a VP of Alexa. Tell me where things stand with it in 2016.
Rohit Prasad: We’re excited about where things are. We did several device launches and also expanded internationally. And we’ve made huge progress in teaching Alexa to better understand you, both in terms of the surface area Alexa covers, as well as accuracy within search material. For instance, think about music as a domain: We have new capabilities [for] you [to be] able to search for or play a song, based on its lyrics. And lastly is the speed at which third-party skills are being built. Earlier this year we had only a few hundred, and now we are in the 5,000 range.
What are the conversational aspirations for the Alexa platform? Are our Echos something we should be talking to, or talking with?
Alexa is already providing a large set of utilities and experiences, where a few one-shot intents work with very high accuracy. From a conversational aspect, I think there a lot of trade-offs on doing it right. Alexa shouldn’t come back and ask you [needless] questions. That would be really frustrating. But Alexa should always ask a question when needed, and the ability to have a conversation is super important as well. Are you aware of the Alexa Prize competition?
This is the $2.5 million challenge to computer science students that you announced in September?
Yes. In academia it’s hard to do research in conversation areas because they don’t have a system like Alexa to work with. So we are making it easy to build new conversational capabilities with a modified version of the Alexa skills kit. This grand challenge is to create a social bot that can carry on a meaningful, coherent, and engaging conversation for 20 minutes.
Would that be a Turing-level kind of conversation, do you think?
No, the Turing test comes down to human gullibility — can you fool an outsider into thinking it’s a human? If you think about certain tasks, Alexa is already better than a human. It’s super hard for a human to play a particular song out of millions of catalog entries within a second, right? If you ask Alexa to compute factorial of 60, that’s hard for a human. So we definitely did not want it to be like a Turing test. It’s more about coherence and engagement.
What are people going to be talking about in these 20 minute conversations with Alexa?
We are giving topics. Like, “Can you talk on the trending topics in today’s newspaper?” We expect the social bot to be able to chat with you on topics like scientific inventions, or the financial crisis.
Have you had a lot of responses to the challenge?
We got an overwhelming number of applications, hundreds and hundreds. We are providing funding to university students — these are grad students who are also taking time off from their research, or hopefully it’s very aligned with their research, so we wanted to make sure that they have sponsorship for the compelling application of speech. We got so many that we couldn’t decide on the original ten we had planned, and we ended up funding twelve teams.
Because of the huge demand in corporations for the best students in AI and machine learning, there’s worry that academia might lose its core talent.
It is a concern. This is one of the reasons that I was motivated to start the Alexa Prize. We want to build the next generation of machine learning and AI scientists, and academia plays an important role in that. I think it would be very myopic and very scary if every professor moved to companies like us.
On the other hand, you are obviously hiring AI talent, competing with with Google, Facebook, Microsoft, Apple, and even traditional companies. What’s the pitch that you give potential recruits to come to Amazon?
I don’t think I should answer that, because those other companies will copy it.
Actually, if you answer it well, those people might read it here and apply to work at Amazon.
What’s unique about research in a company like Amazon is the combination of data, computing power, and the best minds in the world all coming together to solve a customer-facing problem. Working on a customer-facing problem doesn’t take away the innovation — it actually accelerates innovation. The problems we try to solve at Amazon are all super, super hard. When Alexa started, solving speech recognition and natural language understanding across many different domains was clearly a very, very hard problem.
You are announcing new tools that will help developers, right?
Yes. One of the key things we want to make simple for developers is what we call “built-in intent” and “slot types.”
Explain, please.
In most skills, people will want to say things like, “Alexa, stop.” Or “cancel.” You want those commands, or intents, to be exposed to the developer, rather than trying to tell developers to build customized versions of things like the cancel/stop intent. Slot types are things like city names, vocabulary items. We had previously done a handful of them, things developers use quite often — around 10 intents and 15 slot types. So as part of third-party skills we’re announcing a larger set of hundreds built-in functions — slot types — across different domains, like books, video, or local businesses. And also a large set of intents as well, which help answer queries that people ask Alexa.
So in other words, if I’m the developer, I can rely on your built-in vocabulary and your interpretation of synonyms, in order to make my skill smarter off the bat. And you’re doing more.
Exactly. It gives you a much better starting point for interaction with skills. We’re announcing this as a developer preview, because of two reasons. One is, we want to see how people use these in their intents, because we have a certain mindset of how these intents and types should be used. But the developer may have a slightly different mindset. And the customer may use it slightly differently as well. We want to make sure that we get some feedback from the developers and continually improve these, and we will keep adding more and more built-ins.
Right now, when users invoke a skill on Echo, the mind of Alexa, to some degree, gets turned over to that developer. So what you’re implementing today is a step toward a standard Alexa vocabulary and means of execution that developers will plug into?
You’re absolutely right, this creates a common vocabulary which works for sharing and for helping Alexa itself to become better and better. Developers can integrate this new functionality so that they don’t have to recreate the same things.
My issue with Alexa is I’m just overwhelmed by what is available. Generally, you have to know that a skill exists in order to invoke it. Now that you are at 5,000 skills and counting, how can a user keep up?
We definitely want Alexa to tell you how to accomplish your query through a third-party skill, even if you don’t have knowledge of the skill. We haven’t done it yet, but definitely that’s something on our roadmap. Having a common vocabulary helps get us to that connection.
Amazon is only one of several companies using AI to build a conversational interface. What’s unique about your approach?
The hands-free ability is key. That’s the killer application for speech. If you think about Alexa and Echo in particular, there was no cop-out in terms of solving the hard problem of interaction without a screen. So our thinking, from the get-go, was very different from other companies in terms of how conversation interface should be. It wasn’t like on a phone; it was a completely dedicated device which didn’t have a screen. We had to solve the hard problem.
What about people who are concerned about having an open mic in the home? What can you tell people who are worried about, “Oh my god, Amazon is listening to me all the time!”
Privacy is important, and we are being very, very transparent about how we are approaching this. The cloud is not listening to you. It’s only on the device, acting as a detector, not a recognizer recognizing all words. It’s detecting whether Alexa is spoken versus something else. Once it determines Alexa was spoken to the device at a very high confidence, only then do we start streaming to the cloud.
Will Alexa become proactive like Google Now or Apple notifications? Maybe telling me, if it hears me knocking around the house, that I should leave because I’m late for an appointment?
We’ve definitely thought about it. Because there’s no screen on Echo, there are some new [challenges] to it. We want to do things right with anything in terms of the kind of notifications you’re talking about. But right now I can’t reveal our exact approach to that.
Right now, people pretty much have to choose one conversational interface for their assistant. Will we ever see some mashup where Alexa, Cortana, Google Home, or Samsung Viv, or whatever, all work together?
It’s very early days in these conversational settings. Having seen this for 20-plus years I still feel that the [intellectual property] of Alexa and Echo are revolutionary material, specifically in terms of interface. I think it remains to be seen; every company has a different set of offerings so you can imagine that there would be multiple AIs. But in terms of interoperation, it’s too early to tell.
Echo, and the Alexa technology, seemed to come as a surprise to a lot of us, and at first people thought it was intended as a quick way to buy products from Amazon. Now it’s one of Amazon’s most popular products and a significant platform. Has your mission shifted?
I wouldn’t say the mission has shifted. We are still very much doing things that we said three years back we should be doing. Right now there’s just a lot to do to make Alexa even more magical for our customers.
More from Backchannel:
Do Startups Have a Drinking Problem?
What I Saw Inside Apple’s Top Secret Input Lab
Disclosure: Jeff Bezos is an investor in Business Insider through his personal investment company Bezos Expeditions.
Read the original article on Backchannel. Copyright 2016. Follow Backchannel on Twitter.