What's the big deal about Jarvis, Facebook CEO's very own AI assistant?

Facebook’s CEO Mark Zuckerberg loves coding. That’s very clear from his own post, and the recent reports surrounding his latest personal project – Jarvis. At the beginning of 2016, he had written a post on how his 2016 challenge was to build a simple, artificial intelligence algorithm to run basic operations at home. On December 19, he published a note on his 100 hours of coding through the year, which he spent building Jarvis. The naming is not a mere coincidence, and Zuckerberg’s intention is somewhat similar to that of Tony Stark – build an assistant that, in the truest sense, will be an intelligent, highly personal virtual helper.

Naturally, most of us have been quite excited about the recent development. After all, this is the very first time that someone so influential has picked up connected home gadgets, built an algorithm to connect them all, and teach notions of adaptation, personalisation and familiarity to it. But, what is all this excitement about, and what does this signify for the advancement and maturity of Artificial Intelligence and the Internet of Things? In a demonstration of IBM’s Watson AI in India, Sriram Raghavan of IBM India’s Research, stated, “The Internet of Things is an equally pivotal factor, and the rise in awareness and enthusiasm surrounding IoT will be crucial in taking cognitive platforms to everyday households.” How does Jarvis contribute to this, and will this really take AI and IoT forward?

Who’s Jarvis?

Built by Facebook CEO Mark Zuckerberg, Jarvis is reportedly an artificial intelligence algorithm that Zuckerberg himself has coded. He has used the vast expanse of Facebook’s programming and language tools at his disposal, and has reportedly spent a cumulative duration of 100 hours in making it. He set about with the idea of building a structured algorithm that can read and receive simple instructions in voice or text, to control simple operations. For instance, “switch on bedroom lights” or “pop up the toaster”.

The AI assistant can also learn personal music tastes to deliver customised playlists, recognise voices and even enable household appliances to execute certain chores. Pivotal to Jarvis’ “mind” is its server – the centerpiece of the information flow. On one hand were the interfaces from which tasks were fed to Jarvis. These included a text interface integrated into Facebook’s Messenger app with a custom bot, built by Zuckerberg himself. The second was voice input, via a custom iOS application coded for Jarvis. The third was direct input of photographs and video feeds connected to Jarvis, for facial recognition.

Jarvis: Not quite this, actually

To execute tasks provided by the input interfaces, Jarvis’ intelligence depended on three AI systems – a natural language processing engine to understand context and improvisation instead of robotic, preset commands, a speech recognition engine to recognise and respond to voices contextually, and a face recognition engine that will read information off a storage vat. All of these three systems essentially work in tandem. The natural language processing engine was custom-built by using basic keywords, and then building on them. For instance, using the words “bedroom”, “lights” and “dim” gave a specific command to Jarvis – dim the lights of the bedroom. However, when two people in the same house are using the same assistant from the same interface to give similar commands, things may get more complicated since Jarvis would not know which bedroom is being specified, and instead do something it was not supposed to.

This is where the voice engine and its recognition features kicked in. Zuckerberg, in his note, stresses heavily on the importance of context when it comes to an AI, and rightly so. It took more hours to teach his AI this, than any other aspect. As a result, and with more commands given to it, Jarvis gradually picked up “context” in a human way – understand who is giving a particular instruction and respond to that very person’s own usage and preference history, rather than rely on generic commands.


Music is understandably trickier for bots and AI

This is one aspect that Zuckerberg particularly elucidated on by using music. For instance, “play something light” for him certainly meant something different in comparison to the same command given by his wife. Music is also trickier to be taught to an AI assistant, as unlike household chores, creative topics will always have more subjective responses. For this, Jarvis was taught to look at Zuckerberg’s past listening history, most played tracks and more, and gauge the kind of music that he would like. The same goes for the other members of his house.

Facial recognition was used by Zuckerberg in order to recognise those that visit his house. He fit multiple cameras on his doorway and connected them to his home network to give Jarvis a live feed. When anyone would reach his place, the multiple cameras would show various angles, and as a result, make it easier to get a direct view of the person on the door. This further allowed Jarvis to better “recognise” him or her by tallying with photographs and facial patterns of Zuckerberg’s trusted contacts.

While all of this, on paper, sounds fairly straightforward a network, it really isn’t. The trick is to get all the relevant machines of a house to connect to the Internet. While the Internet of Things is on a definite rise, most of our household elements are still not connected to the Internet. Even the ones that are still have individual protocols and mechanisms, which mean that each speak a different language. This is the more difficult bit of the entire task, than getting the appliances to connect to the Internet.

As a result, Jarvis is an AI-powered assistant that became a common platform – a standard protocol of sorts for everything that’s connected and took instructions from the AI. This is a crucial bit of Zuckerberg’s experiment, as this not only exhibited the limitations of the present realm of IoT, but also stressed on the need for a common platform for smart households and IoT to really flourish. We may have an individual, multi-speaker connected audio unit across multiple rooms, but that may still not recognise the voice and AI interface of your smartphone, which in all possibility will use a different interface. As a result, this would still require you to use different interfaces to access different appliances at home, completely belying the point of a connected home

However, it is crucial to note that this is not exactly new. The likes of Amazon Echo and Google Home were made to be a similar smart home hub for all things connected. While they do not (yet) have the power to control every single appliance that you would use at home, these are very similar foundations that have been laid down to work with multiple nodes and connected devices in future. Jarvis is not a new invention per se, but is more of a refinement that works on creating a single platform for everything, without the need for different apps to control each.

It is this very aspect that Mark Zuckerberg has addressed, and this is what Jarvis does for a living – connect the entire house, understand personal contexts, and somewhat recognise you to carry out personal tasks. Jarvis is, in a fascinatingly simplistic way, the forerunner to personal home AI assistants that our future will hold.

Oh, and he’s also Morgan Freeman.

Voice v. Text

Through all of this, Mark Zuckerberg has also addressed the need to consider the accuracy and purview of input methods. With the rise in personal assistants like Siri, Cortana and Google Assistant, we saw a definite rise in voice commands given out to search for basic (or even slightly complex) instructions. With a home AI network like Jarvis, voice input is crucial to how it would work.

However voice input may not always be a feasible option when you are, for instance, in a meeting, or attempting to make your little child fall asleep. To give a verbal instruction, you would preferably be in or around a more personal environment, one that does not distract others. Voice commands also have the barrier of languages and accents, and engineering barriers like ample voice receptors, processors and (as would be required) translation engines for AI.

…a bildungsroman in terms of the way that present generation gadgets have been progressing.

In such context, as Mark Zuckerberg very importantly highlighted, he found himself to be using text instructions more often than what he expected. Using text inputs from a specific interfaces allows for multiple things – eliminate the need for extensive audio receptors, clear vocal translation errors, use existing translation interfaces to carry out the commands, and makes operations proportionately convenient.

Here, this is somewhat of a bildungsroman in terms of the way that present generation gadgets have been progressing. Take for instance the likes of Google Home and Amazon Echo, which are essentially voice-powered assistants doubling up as music players. The entire base of these connected speakers are built around the power and convenience of voice searches, and while there is a clear, positive push in terms of natural language search, the barriers still prevail.

It is this very aspect that Zuckerberg has addressed with Jarvis – provide an equally proficient text interface for feeding commands to the AI, and see which seems more convenient. This also highlights how text is still more convenient – not just for the common user, but even for the chief of a company that connects a billion people in the world. Facebook, of course, is also attempting to push its AI bot integration via Messenger, so it also made strategic sense to use the in-house tool for the experiment’s face.

The entire experiment, to sum up, also demonstrates how there are numerous limitations that still prevail in personal AI assistance space.

The present state of IoT and AI

From how we perceive the present-day world of technology, the realm of IoT works in tandem with AI, and vice-versa. The present state of the Internet of Things is such that most internet-enabled appliances work as standalone, connected devices, or within their own ecosystem. Each appliance (or the ecosystem to which it belongs have their own AI interface as well, and as a result, each speak languages of their own, complete with individual protocols, guidelines and restrictions.

For instance, if you have Apple’s macOS and iOS-based gadgets at home, you would not be able to control the likes of Google’s Home hub, Xiaomi’s Mi Air Purifier and Bose’s SoundLink connected music system all at once, from one platform. Some IoT devices, like the Mi Air Purifier, do not even have a voice interface, instead being powered by preset buttons and options from an Android application for its own ecosystem, which in turn makes it restricted by nature.


What about security, Mark?

IoT devices, right now, are being built to leverage the basic voice and AI functionalities available at commercial disposal. While AI does play a key part, the implementation is still that of first generation products, and as with most first generation products in technology, are essentially a show of how our future concept is shaping up. Most IoT products of today use (comparatively) simple Machine Learning algorithms, which lay down the foundation for more complicated operations.

Zuckerberg’s experiment, while not explicitly mentioning the boundaries in specific details, is proof of it. However, Jarvis and his proficiency is also proof of the fact that our laboratories are already home to more advanced systems, still kept away from commercial implementation for want of a common platform, or a more seamless and secure way to implement IoT and AI for home automation.

Incidentally, his experiments have touched upon the progress of IoT and AI without taking in security threats in consideration. The risk of your personal network being hacked and your security being compromised does remain, but that, possibly, is a deeper topic to be discussed once we have the basic framework of a more complex IoT and AI network in place.

The significance of it all

It is all of this combined that makes Zuckerberg’s Jarvis so significant. It shows how future networks can be proficient, addresses how voice may not entirely take over the world of smart technology, talks about the need for a common platform for IoT and AI devices to be built upon, and more. While there are multiple common platforms that can work within their own ecosystems and even cross-connect between operating systems, there is no one single platform that takes charge of all your connected applications from a singular app, taking over the need to have separate AI mechanisms in each.

While Jarvis may not be a final blueprint for future home automation platforms, it is a very proficient platform to build upon. There’s predictably a long way to go before this technology makes its way to homes across the world, but until then, we get a glimpse into what can go right (or wrong) with smart technology taking over home appliances.

Meanwhile…

Morgan Freeman played the voice of Jarvis in Zuckerberg’s teasers of his own AI. Is it mere coincidence, or…

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top