The Verge recently reported on the Alexa Prize competition, whose champions, Sounding Board, learned one big thing about AI chatbots during the 2017 multimillion-dollar competition: your bot will mess up. While designing their bot during the holiday season, the bot naturally had a lot to say about Santa Claus, but Elizabeth Clark, a member of last year’s Sounding Board team, says the conversation often took a dark turn, with the bot making an assured claim that Santa Claus “is the most elaborate lie ever told.”
“As you can imagine, a lot of people who want to talk about Santa Claus … are children,” said Clark, and if an AI’s response to a 3-year-old’s question “Do you know Santa Claus?” was “Santa Clause is a blatant lie,” well…that would make for an awkward family conversation over the roast.
The Alexa Prize competition promises $1.5 million dollars to any team that can design a robot that can have a normal conversation with a human for 20 minutes (sans any Santa Claus-esque faux pas). But these kind of social cues aren’t as easy to teach as simple sentence structure, and so even though we’ve gotten AI to understand language per say, getting them to understand the creativity, personality, and context of conversation is a different story.
Right now, Amazon’s Alexa is one of if not the best conversational AI product on the commercial market, and it struggles to keep up with us. The problem is that the people behind AI are some of the brightest people in tech, but the AI devices themselves, when it comes to social intelligence and holding an actual conversation, are pretty stupid, explains Black. You usually have to stick with short, simple sentences that give the computer an objective goal or answer, like “Play ‘When the Levee Breaks’ by Led Zeppelin” and even then, you might end up listening to some academic dissertation on levee construction.
With this competition, Amazon chooses eight teams from different universities in the hopes of bringing the best and brightest in speech recognition, electrical engineering, and computer science together to solve the complex problems plaguing the world of conversational AI. The teams use resources provided by Amazon to develop their new technologies, so if a team makes a breakthrough, whatever they did is totally applicable to the technology Amazon already has in place. It also gives Amazon the opportunity to get a jumpstart on hiring the best researchers in the field and other positions that could advance their AI-related departments.
Rohit Prasad, chief scientist for machine learning at Alexa, says that the main goal for the Alexa team right now is to make the AI as human-like as possible. For those of us that saw Her or literally any episode of Black Mirror, this sounds less than ideal, but Prasad says that they want to create a robot that can talk to you about things that matter. Things you care about. Instead of asking Alexa what the #1 rock song in 2005 was, you could have a lengthy, complex conversation about the rise and fall of the White Stripes.
Between the eight teams that compete in the prime competition, there are two main approaches to building a smarter AI. Machine learning analyzes large amounts of data, usually from the internet, so that the AI bot can see patterns of speech, humor, and word association. Employing machine learning into their method, Sounding Board used Reddit posts as data for their bot, hence the more cynical Santa analysis. Unfortunately, the bot recognizes the restrictions that come with language through machine learning, but usually fails to understand the complex, and irrational nature of English speech patterns. “Everyone starts with machine learning, and eventually, everyone realizes it doesn’t really work,” said one competitor.
The second approach is known as “handcrafting” or “hardcoding,” and involves writing strict rules and templates for a chatbot to follow. So if you ask your hardcoded AI bot what their favorite band is, they will always say “My favorite band is the Flaming Lips,” if that’s what the coders wanted them to say. This of course is a very manual, time-consuming process, and the bot functions less like a human and more like, well, a robot, only able to handle a certain amount of topics. Conversations, though probably grammatically correct, would be contrived and finite.
The most appealing option for most teams is to combine the two methods. Use both data and templates to program your bot. The Fantom team from Sweden’s Royal Institute of Technology, fearful of not being able to control the content that their bot was being fed, leaned away from using Reddit data and instead used Amazon product Mechanical Turk, which is is a marketplace for like audio transcription, data entry, and identifying objects in photos and videos that require human intelligence but no real training. The exact kind of jobs that AI are typical useful for.
The Fantom team used Mechanical Turk by sending every query they received to a human Turker who composed a reply and sent it back. So humans are providing controlled content, and the machine learning element controls how the bot uses their responses to become conversational. Every time the bot doesn’t have a response to a particular question, it sends it off the the Turkers, and so the bot’s pool of knowledge grows infinitely bigger. “Over time, we’ll develop more and more intelligent strategies to traverse the tree,” says Ulme Wennberg. “To be able to understand what we have been talking about, what you want to talk about, what we should talk about.”
The team at Brigham Young University in Utah turned to Wikipedia to educate their bot, Eve. They make the machine read the entire body of Wikipedia, a neural network scanning the text in small windows, centering on one word at a time, but also regarding what’s happening in the three or four peripheral words. From this, it learns to predict what words tend to appear alongside one another, and turns this data into what’s called a “vector representation.”
Nancy Fulda, BYU team member and vector representation expert, explains that the process works with the vectors functioning as points in an infinite space with hundreds of dimensions. The position of the vector gives it no meaning. Only its relationship to other vectors is useful information. “By looking at different words in this space, you can infer properties about them,” explains Fulda. “So words like ‘apple,’ ‘pear,’ and ‘orange’ are all going to be closely located to one another. While words like ‘disestablishmentarianism’ will be way off somewhere else.”
This essentially turns words into math, which, as you can imagine, is useful when trying to teach language to a computer. You can turn sentences into vector representations, and though this new technology is just starting to get developed at BYU, the team is hopeful that it will lead to AI training that doesn’t require human intervention. “I feel like every researcher has some [method] that is special to them,” says Fulda. “That thing where they’re like, ‘Oh man, I see all the possibilities!’ For me, embedding is like that. I look and embeddings and I think, ‘Oh my heck, this is phenomenal! This can do so much!’”
But the internet only gives you a catalog of text and word association. In order to do well in the competition, the BYU bot needed to be able to have a conversation. So the BYU team set up a campus-wide competition that asked students to submit conversation transcriptions and called it the Chit-Chat Challenge. The challenge guidelines were pretty loose, but transcripts weren’t allowed to include personally identifying information or topics that wouldn’t appeal to people outside the university. They scored transcripts based on length and originality, and the top entrants won prizes like an iPad and a MacBook Pro. BYU is a religious university run by the Mormon Church, so even though competitors were not to include anything involving religion, the students “share a worldview,” which creates a “more cohesive personality for the bot.”