Brian Christian on ‘The Most Human Human’


At Work

Photograph by Michael Langan.

Brian Christian, who studied computer science, philosophy, and poetry, has just published his first book, The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive. Recently, he answered my questions about the Turing test, online romance, and conversation fillers.

Your new book has an odd but intriguing title: The Most Human Human. Can you explain what it means?

The Most Human Human is an award given out each year at the Loebner Prize, the artificial intelligence (AI) community’s most controversial and anticipated annual competition. The event is what’s called a Turing test, in which a panel of judges conducts a series of five-minute-long chat conversations over a computer with a series of real people and with a series of computer programs pretending to be people by mimicking human responses. The catch, of course, is that the judges don’t know at the start who’s who, and it’s their job in five minutes of conversation to try to find out.

Each year, the program that does the best job of persuading the judges that it’s human wins the Most Human Computer award and a small research grant for its programmers. But there’s also an award, strangely enough, for the human who does the best job of swaying the judges: the Most Human Human award.

British mathematician Alan Turing famously predicted in 1950 that computers would be passing the Turing test—that is, consistently fooling judges at least 30 percent of the time and as a result, generally considered to be intelligent in the human sense—by the year 2000. This prediction didn’t come to pass, but I was riveted to read that, in 2008, the computers came up shy of that mark by just a single vote. I decided to call up the test’s organizers and get involved in the 2009 contest as one of the human “confederates”—which meant I was both a part of the human “defense,” trying to prevent the machines from passing the test, and also vying with my fellow confederates for that intriguing Most Human Human award. The book tells the story of my attempt to prepare, as well as I could, for that role: What exactly does it mean to “act as human as possible” in a Turing test? And what does it mean in life?

The organizers of the Turing test competition gave you the following advice as you set out to win the Most Human Human prize: just be yourself; you are, after all, human. Why wasn’t just being human good enough to win? How can you be more human than other humans?

Oxford philosopher John Lucas says that if the Turing test is passed, it will not be “because machines are so intelligent, but because humans, many of them at least, are so wooden.” That is part of what’s so fascinating about the Turing test:  fundamentally it’s a test of communication, and there’s a sense in which this contest, which we invented as a means for measuring the machines, actually turns out to be a means of measuring ourselves.

On the one hand, the ability to fluently use natural language is an incredible challenge for computer scientists, whereas it comes naturally to us, even as children. But on the other hand, we know from experience that conversations aren’t uniformly successful, and there’s a huge demand in our culture for public-speaking coaches, communication coaches, dating advice. The intriguing paradox for me is that communication is perhaps our greatest cognitive achievement and the place with the greatest room for improvement.

As you point out in the book, Daniel Gilbert, the author of Stumbling on Happiness, has said that every psychologist must at some point fill in the following sentence: “The human being is the only animal that _____.” In preparing for the Turing test, did you come up with your own formulation?

It’s intriguing that Gilbert says the “only animal,” because I think that has actually been the biggest shift in the past fifty years. This question of what makes humans special and different and unique goes back all the way at least to Aristotle and Plato, and traditionally we’ve compared ourselves against animals as a means of answering it. Now, in the twenty-first century, we much more commonly compare ourselves against machines. I think one of the fascinating things about living at this particular moment in history is that the computer represents perhaps the greatest difference between our world and Aristotle’s. So computers are not only shedding light on that answer but are also forcing us to revise that answer at the same time.

As for my own sentence, I don’t necessarily know—and I’ll be interested to see how others revise their sentences in the years to come—but there’s something to be said for the fact that humans are rather uniquely poised in a place where they have access to both deliberate, logical thinking and instinctive, intuitive thinking. And of course, one can always turn the sentence on itself: humans appear to be the only things anxious about what makes them unique.

You point out that AI programs generally produce so-called stateless conversations, in which the chatbot is responding only to the previous question, showing no awareness of the larger context or form of the dialogue. What does this reveal about how computers think? And what insight did this give you into human conversation?

If you look, for instance, at Jeopardy! transcripts, you realize that you can scramble the order of the questions without affecting the answers. But if you scramble the order of a human dialogue, you likely destroy it. A conversation is more than the sum of its parts. It has an arc; it generates its own context.

One of the most telling things about human conversation is how context sensitive it is. I love walking down the street and trying to pick out snippets of conversation. They’re frequently completely unintelligible: “Yeah, and so that’s what she told him, but he thought that they’d never even been there, right?” “Right, no, yeah, no, I know!” Probably if we’d been listening in for several minutes, that would make sense. Or perhaps not. I imagine someone sitting down to lunch with a close friend, and the first question is “So have you finally told her how you feel yet?” Sometimes a pronoun can be locked into place for years. This type of compressed, contextual richness is part of what we value about longstanding relationships.

In one of the most remarkable anecdotes in the book, you describe how one of the founders of the Turing test competition fell in love with a Russian woman on an online dating service—only to discover that he had been exchanging lengthy love letters for more than four months with a computer program! What does this say about the nature of love? Are falling in love online and offline different?

To convey our identity online we use content, information, your PIN, SSN, password, code. In real life we use not content but form: I recognize you by the shape of your face, the timbre of your voice, your handwriting or signature. Less by what you say or write than how.

To take just one recent example of how online and offline intimacy diverge: I was saddened when Facebook changed its Activities and Interests sections from write-in boxes to essentially drop-down menus. Their goal was to get the people who like “basketball” and the people who like “bball” to be linked to the same page—innocuous enough at first blush. But in doing so, the space goes from being a place to define ourselves with how we express our interests to simply what those interests are. What’s lost is tremendous.

I recall hearing the inventor of speed dating, Yaacov Deyo, say that in his first few sessions he had to go so far as to literally ban the questions “So what do you do?” and “So where are you from?” Their answers, overly informational, weren’t actually very productive. What we love about people isn’t their properties, it’s their manner. With that in mind, there are far better questions to ask, ones that evince not the way someone looks on paper but rather their idiosyncrasies, attitudes, personality, style.

Everyday conversation is filled with seemingly meaningless fillers such as uh and um. These nonwords make most of us sound pretty stupid, but how do they in fact capture what is most deeply human about us?

We often think about conversation as a series of discrete “turns,” but as linguists are discovering, the negotiation and construction of those turns, every bit as much as what’s said during them, is an incredibly complex, subtle, and vital part of communication. The turns of a dialogue, unlike an epistolary exchange or a walkie-talkie conversation where each segment is clearly and rigidly delineated, are fluid: speakers chime in, interrupt, finish, or revise each other’s sentences, offer yeahs and mm-hmms and really?s, yield the floor, hold the floor. There’s as much complexity and art to the form of a conversation as the content.

Uh and um (and their foreign-language equivalents) surprisingly turn out to be a big part of this process of negotiating conversational structure.

Noam Chomsky’s theory of syntax famously excluded “such grammatically irrelevant conditions as memory limitations, distractions, [and] shifts of attention and interest,” but in practice, as language is performed “out in the wild,” each of these factors is hugely influential, and each holds its own store of subtlety and nuance. Linguists have established, for instance, that different-length pauses typically follow uh than follow um, which implies that speakers are making purposeful (if unconscious) decisions between the two.

The good news is that in the 2009 competition no AI program passed the Turing test, and you won the Most Human Human prize. What clinched the prize for you?

It’s easy to look back now and feel that my choices were right, but of course at the time I had no idea if I was being successful or not. Trying too hard risks its own sense of falseness, of course. (One year a Shakespeare expert famously discovered—after consciously trying to demonstrate as much knowledge as she possibly could—that two judges simply decided, “No one knows that much about Shakespeare” and voted her a computer.)

I don’t know if anything necessarily clinched it, but I tried to be as sensitive as I could to everything I’d learned in my preparation about what makes human conversation so subtle and complex, and conversely what sorts of corners chatbots tend to cut, so that I could emphasize precisely those things. I tried to create more of a linguistic collaboration, rather than a strict taking of turns—not only to answer what was asked of me but to at times finish the judge’s sentence, or respond in advance to what I was anticipating they’d ask next. I also strove to steer the conversation as quickly as possible out of the patterned small talk and etiquette that the programs know by rote, and to give not only appropriate and context-sensitive answers but to generate a larger arc to the conversation and to have the answers present more than the sum of their parts: a coherent and distinct sense of personality from one to the next. If they had asked me what the morning’s weather was, for instance, I’d not only respond with the correct “gray and rainy” but would volunteer something rooting the answer into my life story: “Oh, gray and rainy, but I live in Seattle, so that’s pretty much par for the course.” Those little touches gave the judge a lot more options for follow-up questions and worked to break the conversation out of either simple pleasantries or a more interrogation-style Q & A and into something much more natural and free flowing.

Other parts of the strategy were slightly more complicated, but at the end of the day, part of what’s scary about something like the Turing test is that you can never know for certain that you and a judge will be each other’s “kind of people”—whether or not you’ll resonate with each other and hit it off. Study only gets you so far. At the same time, a look at how these programs attempt to model and mimic conversation, and simultaneously at the linguists and philosophers who study it, provides an incredible window into how to keep conversation firing on all cylinders, and at just how bewilderingly nuanced and complex it is when you do.

When a program does pass the test—and this will eventually happen—is it nightfall for humanity?

No, I don’t think so. There’s a certain mythos to the idea that as soon as the Turing test is passed, it’s passed forever; for instance, once the gold Loebner Prize award is handed out, the Loebner Prize will be discontinued and never held again. In fact, I think that the first time we lose a Turing test will be a fascinating—and redemptive—moment: a slap in the face, but also a bit of a wake-up call, a call to arms. Humans are by far the most adaptive species on the planet, and far from spelling our doom, a defeat might be a healthy spur toward raising the level of play, which, in the case of the Turing test, also happens to be the level at which we communicate with one another in our everyday lives.

As Stephen Baker tells it, the IBM team and the Jeopardy! producers clashed when IBM refused to let the show’s writers prepare brand-new questions specially for the match after having seen what their Watson machine could do. IBM was worried they’d unconsciously devise newer and cleverer types of questions to target the gaps in Watson’s abilities. For instance, one of the clues in the Watson contest was “In 2002 Eminem signed this rapper to a seven-figure deal, obviously worth a lot more than his name implies.” (Answer: 50 Cent.)  A Jeopardy! writer familiar with how Watson’s AI works couldn’t help but think that removing explicit reference to Eminem and 2002, leaving only the pun about what the rapper’s name implies, makes it only slightly harder for the human contestants, but totally pulls the rug out from under the machine. In some sense, I think the real essence of the Watson match isn’t the machine versus the contestants but the machine versus the writers: an arms race between better answers and better questions.

This to me suggests precisely the kind of human trait I would be exhilarated to see in the context of the Turing test. The increasing ubiquity of e-mail spam is already forcing us to humanize our e-mails, lest they be judged inauthentic. What would a Turing test–passing bot do to our conversations? My experience as a confederate—looking at how they attempt to simplify conversation to make it manageable and doing what I could to avoid and disrupt those simplifications—gave me a window into what that might look like, and it’s something I would be fascinated to see society as a whole grapple with. How do we be “more human” in our interactions with one another? The pressure that the Turing test is beginning to put on us is, I think, an incredibly healthy and productive one.

What do you think we’ve learned from Watson’s performance?

One of the riveting things about watching the history of AI unfold is that it’s come back to us with some very startling and counterintuitive verdicts about what’s “hard” and what’s “easy.” Who would have imagined we’d have computers landing planes before they could ride bikes, or translating UN documents before they could look at a picture of a horse and say, “That’s a horse”?  Compared to the order in which humans learn these things, and how difficult we believe them to be, these sorts of results are really quite astonishing.

That’s what you see when you look at the IBM contest. What impresses us about Jeopardy! champions like Ken Jennings and Brad Rutter is how much knowledge they’re able to retain. That they know what the questions are asking and that they stand at the podium before the round and banter with Trebek isn’t on our radar at all. But for the IBM team, encoding virtually all world knowledge wasn’t approached as being particularly challenging—or particularly interesting. The entire weight of the research team was thrown at deciphering the language of the clues: their puns, their wordplay, their ambiguity. Google represents a step ahead of, say, database queries, and Watson represents a step ahead of that; we’re posing clues rather than keywords and getting back phrases instead of URLs, but the result is still much closer to a deposition than a conversation.

I actually think that if you look at it in its broader historical context, you can view the contest as a kind of victory, or at least a validation. The challenge wasn’t the things about Rutter and Jennings that make them trivia experts or Jeopardy! champions; it was the things that make them people.

Indeed, some see the history of AI as a dehumanizing narrative; I see it as much the reverse. We build these things in our own image, leveraging all the understanding of ourselves we have, and then we get to see where they fall short. That gap always has something new to teach us about who we are.

We conducted this exchange via e-mail. So what do you think: am I machine or human, and how do you know? (Cue ominous music.)

There’s a great quote from Pablo Picasso: “Computers are useless. They can only give you answers.” To me there’s something deeply true about that: to ask a good question is at least as complex and subtle as to give a good answer—maybe more so. And increasingly I come to think of curiosity as the distinctively human emotion, because it’s a part of our hybrid animal/rational nature: desire and knowledge bumping up against one another. So you may be relieved to know that as computers edge their way into more and more sectors of the job market, the people with the highest job security will likely be the interviewers.