Shout at the devil: the confusing world of talking to computers

In the last six months, every major tech company has unveiled its vision for the future of computing. And funnily enough, they’re all saying the same thing: in the future, we’re going to talk to our computers — and they’re going to answer back.

Microsoft calls it “conversation as a platform.” Google says it wants to computers to have an “ongoing two-way dialogue” with their users. This both is and isn’t a metaphor. Digital assistants that we talk to — Siri, Alexa, Cortana — are already commonplace, but creating computers that talk back will mean something extra: using machine learning to offer users prompts and suggestions. “Our goal with artificial intelligence is to build systems that are better than people at perception,” said Facebook’s Mark Zuckerberg. “[At] seeing, hearing, language and so on.”

Language is key though. Talking to computers has been a sci-fi trope for decades, but it’s only in the last few years that we’ve been able to take the prospect seriously. Advances in artificial intelligence — deep learning particularly — have massively improved natural language processing, while the combination of the cloud and ever-more powerful smartphones have provided an infrastructure for these speaking assistants. If you want a measure of how ubiquitous talking to computers has become, consider the fact that Domino’s has had its own chatty, pizza-ordering assistant named Dom for years. (“He’s fun,” said Domino’s head of marketing recently. “But very focused on the pizza ordering experience.”)

Dom’s pizza-focused existence also illustrates a key feature of the world of talking computers right now: it’s confusing as hell. All sorts of fundamentally complementary tools and technologies are being built by different companies, but as with the advent of any new computing paradigm — be that desktop, web, or mobile — there’s no shared game plan or grand strategy. There’s just machine learning and human imagination, creating exciting new pieces for an incomplete puzzle. How everything is going to fit together in 20 years’ time is anyone’s guess, but we can at least take a look at where things stand right now.

Say ‘hello’ to the new chattering classes

So. The most important players in this new world are the “digital assistants:” Siri, Alexa, Cortana, Facebook M, Google Assistant, and a handful of third-party players. These, say their makers, will be our computing familiars, the programs that we’ll spend most of our time talking to. They’ll be accessible on different platforms (phones, watches, cars, home hubs) but keep tabs on our personal data, schedules, and location, across an entire network. And thanks to machine learning, they’ll understand human speech better than any computers before, able to grok context and slang, and, eventually, emotion and intent.

Assistants like this will offer us ambient computing. We won’t necessarily access them through a screen or a console. Instead, they’ll be hanging in the air — just speak and get an answer. But before we get assistants everywhere, we’ll have to access them in particular places. Amazon has dug deep into the home, with the Echo, Dot, and Tap, while both Google and Apple are multi-platform: available on your phone, in your car, and on your wrist. Microsoft and Facebook’s strongholds are less well defined, but again, there’s a lot overlap and lots of space to expand into.

And filling the gaps between assistants’ spheres of influence (and a couple of rungs below on the ladder of artificial intelligence) are the bots and the apps. These are more simplistic than Siri, and much more utilitarian. At their most basic, they’re simply a replacement for graphical user interfaces, where you type out your instructions rather than navigate a set of dialog boxes. Many are little more than gimmicks (do you really need to buy cinema tickets from a bot?), but the more complex models have a lot of potential (like a chatbot in Slack or Gmail that can find every document your boss sent you in the last month).

The Verge’s Casey Newton explained the rise of chatbots back in January, and even since then, they’ve made significant gains. In March, Microsoft launched a set of AI-powered tools to let anyone build their own, and in April, Facebook opened up Messenger as a bot platform. “You never have to call 1-800-FLOWERS again,” was Mark Zuckerberg’s pitch, while Microsoft’s Satya Nadella described chatbots as the “new applications.”

Nadella went on to say that in the future, personal assistants like Cortana and Alexa will act as bot bosses, interacting with one another on a user’s behalf. However, this is an over-simplistic summary — things are going to be much messier than this. Will assistants talk to bots like they talk to APIs? How will these different programs exchange user data securely? What software is going to harvest the slang in your texts with friends and feed it into the program that writes your emails to your work colleagues? And how do you tell it to stop? It’s tricky territory, but here are the pieces we know about so far:

Amazon

Amazon’s Alexa has become the benchmark for digital assistants in your house. While the functionality of the original Echo was as much about playing music as voice commands, Alexa’s skill set has quickly expanded. It now works with more than 1,000 services, and Amazon says its developer tools mean companies can integrate their software in just 60 minutes.

The Echo has done so well because it got the basics right. Alexa responds quickly and understands your query, even from across the room. Amazon is now capitalizing on this, widening Alexa’s availability with the release of the hockey puck-sized Echo Dot (which works with any speaker), and allowing other companies to make their own Alexa-powered hardware.

But while it nailed the essentials, Alexa isn’t as “smart” as other assistants. It has limited personal information and to access one of its connected services you have to know the right keywords. In that respect, it’s more command line than natural language. Amazon boss Jeff Bezos says the company is “deeply committed” to AI, though, and that Amazon has more than a thousand people working on Alexa. The company is even reportedly working on building emotional intelligence into the digital assistant next, so it’ll know when you’re irritated and temper its responses accordingly.

Apple

Despite being an early frontrunner in voice interfaces, Apple has fallen behind. Siri is well known, but its voice recognition abilities are spotty and its functionality limited. At WWDC this year, the company made some promising changes, porting Siri onto to macOS and allowing integration with third-party services (messaging apps, ride-hailing services, and fitness trackers were all used as examples), but the story until now was mainly one of squandered potential. The big question is whether Apple’s approach of infusing its own apps with AI smarts (which are being branded as “Siri” assistive features) will be more successful than creating generic “bots.” Apple has done very well by apps in the past — will it be able to do so again?

Thanks to CarPlay, the Apple Watch, and the iPhone, Apple’s personal assistant has the potential to be a constant companion, but it’s still working on integrating everything it knows about you (your location, your schedule) in a truly useful fashion. Similarly, although Apple has put Siri on Macs, it’s not given the assistant any new skills to take advantage of its new home. And while there have been rumors of an Echo-rival Siri Speaker (you can already use Siri to control automated tech in your house via HomeKit), that’s yet to materialize.

Oddly enough, Apple might be able to turn what has been seen as one of its major weaknesses in this area into a strength. The company’s development of AI has been hamstrung by its stance on privacy. It doesn’t collect as much as user data as its rivals, so it has less information to feed its machine learning programs. However, at WWDC the company turned this around, announcing new methods of data collection that it claims will preserve users’ privacy, and new methods of support for on-device AI. For users who want the benefit of smart assistants without the privacy sting, Apple’s approach could pay off in the long run. Assuming, of course, it works as well as Apple says it does.

Facebook

Facebook’s presence in the talking computers game is perhaps the most limited of these major players. While it does have its own digital assistant, Facebook M, it’s not built solely on AI, and works through a combination of chatbot-style replies and good old fashioned human labor. You can only type to talk to M since there’s no voice interface, and answers often take several minutes to appear.

Facebook has its strengths, though. Its machine learning team has done some significant work in fields like image recognition, and it also has its Messenger app, which was opened up as a chatbot platform back in April. This has the potential to introduce hundreds of millions of users to conversational computing. But at the moment, Facebook-hosted bots have been slow and a bit clunky. No-one wants to wait an hour to find out what the weather is like, especially when they can just Google it.

Google

Speaking of Google, it’s the search engine giant that perhaps has the greatest potential for creating a useful talking computer. It’s got the machine learning chops and the Midas Touch of data collection strategies: creating useful information every time it interacts with users. At Google IO this May, the company unveiled Google Home (an Echo rival that will answer questions and manage your media), and showed how its smart assistant tech is being woven into the fabric of other products — offering conversational tips in its messaging app Allo, for example.

Compared to other companies, Google also has a particularly deep bench for machine learning expertise, and over the last year has set about creating an ecosystem for its AI tools. Last November it open-sourced its in-house machine learning software Tensor Flow, and then began offering free online courses for the program. This year, it revealed its own custom processors built for machine learning, which were used to power AlphaGo’s victories and which the company claims offers the same leap forward in performance per watt as skipping three generations of Moore’s Law.

All this means that the company could easily go in any number of directions in terms of building assistive, talking computers. Just to consider one possible avenue, look at Google Springboard — a tool for enterprise customers the company announced this month. Its primary focus is search, but Google says it will also help users by offering “useful and actionable information and recommendations.” This is the sort of low-level AI functionality (think of it as a superpowered Clippy if you dare) that we can expect to see in more parts of our digital lives. And if we get used to accepting help from a computer at work, we’ll be more likely to welcome the same assistance elsewhere.

Microsoft’s digital assistant Cortana on the Xbox One

Microsoft

Microsoft, meanwhile, already has a superpowered version of Clippy called Cortana. Of all the digital assistants available, Cortana has been best integrated into desktop computers. It’s not surprising considering Microsoft’s weakness on mobile, but the assistant was launched on phones. On Windows 10, Cortana can manage meetings and reminders, launch programs, answer factual questions without opening a browser, and send email for you. Microsoft is also going to put the assistant in your TV via the Xbox One.

And of all the major tech companies, Microsoft is making the biggest push into chatbots. At its Build developer conference in March, the company unveiled its Bot Framework — a set of prepackaged, AI-powered tools that let anyone create their own chatbot. It sounds like Microsoft is just selling pickaxes during a gold rush here, but Satya Nadella also offered the most convincing vision for the future of talking computers, describing “conversation as a platform” as the next big user interface. “We think this can have as profound an impact as the previous platform shifts have had,” he said.

The Rest

And alongside these major players, there are a cohort of smaller companies pushing the same vision. These include third-party personal assistants like Hound, Amy, and Viv (the latter made by ex-Siri developers who thought Apple’s vision for voice interfaces wasn’t ambitious enough). There are also more simplistic bots like Howdy (which lives in Slack and is meant to help your workplace by organizing meetings and ordering lunch). And there are a number of bot platforms and toolkits, like Chatfuel, Msg.ai, and Luka, to name a few.

The big challenge for these companies — especially those building their own assistants — is getting noticed. The major tech players are all invested in their own platforms, and third parties are going to need some compelling features to persuade users to sign up, especially when assistants like Apple’s and Google’s will be integrated so tightly into their respective platforms. There’s also a lot at stake. The assistant you shout for in the future — will it be Viv? Siri? Hound? — will occupy a position similar to the top search results in Google: it has the first shot at giving you you the information you need.

What’s next?

Mayhem, probably.

With so many companies trying to push talking computers on us, there will be the sort of confusion you get with any platform war. And some of this tech just isn’t very good yet (chatbots particularly) and may never be very good, leading to wasted money and consumer distrust. There are also complex issues surrounding privacy: personal assistants need to know a lot about you to be useful, and lots of us are wary of tech companies’ reach into our lives. There are also practical problems, like who’s going to want to talk to their computer within earshot of strangers? You might be happy using a voice interface in your house and your car, but not in an open-plan office. And in scenarios where you can’t talk, digital assistants are going have to be very smart to offer you information you need without making you work for it, otherwise why bother with them at all?

It’s also worth pointing out that this is still, at heart, a trend more than a reality. Because tech companies deal with the stuff of hard sciences — maths, engineering, and the like— their products and predictions are often treated like the latest discovery: like something objectively better than what’s come before. But really, these are just commercial entities like any others looking for a new way to make money. They don’t necessarily know what’s best.

That being said, thinking about about how consumer technology has changed over the past decades, talking to computers does seem like a natural endpoint. Computers used to be things you had to learn to use, but over the years they’ve become increasingly intuitive — partly because of new interfaces, and partly because we’re bought up using them. As smart assistants begin to pop up in more places, helping organize your work, look after your house, and entertain you, they’ll feel less like interfaces you need to wrangle with, and more like companions. At that point, we all might be a lot more comfortable about having a chat with our computers.