Living with the Turing test.
Researchers from the National Advisory Committee for Aeronautics (NACA) using an IBM type 704 electronic data processing machine in 1957. Photo: Wikimedia Commons
As of last week, the Turing test has—allegedly—been passed. In 1950, Alan Turing famously predicted that in the early twenty-first century, computer programs capable of sending and receiving text messages would be able to fool human judges into mistaking them for humans 30 percent of the time, and that we would come to “speak of machines thinking without expecting to be contradicted.” Two weekends ago, at a Turing test competition held at the Royal Society in London, a piece of so-called “chatbot” software called “Eugene Goostman” crossed that mark, fooling ten of the thirty human judges who spoke with it.
The official press release described this as a “milestone in computing history”—a “historic event.” Was it? We should not, of course, take a press release’s word for it. (Said release describes the winning chatbot program as a “supercomputer,” a head-scratching conflation of hardware with software.)
The release says this is the first time a computer program has scored above 30 percent in an “unrestricted” Turing test. This appears to be plausibly true. We don’t have access to the transcripts of these conversations—the organizers declined my request—but we know that the persona adopted by the winning chatbot (“Eugene Goostman”) was that of a thirteen-year-old, non-native-speaking foreigner. The Turing tests of the 1990s were restricted by topics, with the judge’s questions limited to a single domain. Here, the place of those constraints has been taken by restricted fluency: both linguistic and cultural. From correspondence with the contest organizers, I learned that the human judges were themselves chosen to include children and nonnative speakers. So we might fairly argue about what, for a Turing test, truly counts. These questions are deeper than they seem.
There’s an important methodological point that’s been lost, though, in most of the discussion so far of these results. The Turing test is a paired comparison. It’s not that a judge chats with Goostman, is then asked, “Do you think that was a person?” and says, “Sure, why not?” It’s that the judge chats with Goostman and a human, and is then asked which is alive. In other words, we can’t say that Goostman, to ten out of the thirty judges, seemed plausibly human. It’s that Goostman, to ten human judges, seemed more plausibly human than ten real people.
I was, myself, one of those real people—known as “human confederates”—at the Loebner Prize Turing test competition several years ago, battling bots much like Goostman for the judges’ faith. I’d studied the test transcripts before I participated, and have studied them since, and the simple truth is that bots have just not come particularly far in their annual Turing test performance since these annual contests began in the early nineties.
The story is elsewhere. What they have done is completely saturate modern life.
Part of the modern drudgery of the aughts-and-tens Internet is the tedium of filling out CAPTCHAs—those tiresome forms made to prevent spam by forcing us to prove that we’re human, with the most common and recognizable being Google’s “reCAPTCHA.” These checkpoints of the Web are, in fact, Turing tests: computer-judged Turing tests. (To wit, Completely Automated Public Turing tests to tell Computers and Humans Apart.)
Within the CAPTCHA is a quiet but perceptible arms race. When they debuted, they made us read text. First we had only to read text to attest our humanity; then it was distorted text; then it was photographs of the blurry or misprinted text that the best computer vision algorithms couldn’t make sense of—much of it legitimately ambiguous. Confronted with an eighteenth-century text, is the correct answer “Loft” or “Lost” when I see “Loſt”? And how am I supposed to know, outside of all context, if that’s a “1”, “l”, or “I” in some grainy print?
ReCAPTCHA claims to be “Tough on bots. Easy on humans.” And yet Google announced this spring that its own computer vision algorithms were now better than humans at identifying text in images. Where does this project end? This is a Turing test in continuous operation, with cybersecurity at stake. As with all Turing tests, our very participation is used to hone the opponent. As necessary as ever, the CAPTCHA is discomfitingly dynamic—that is to say, unsustainable.
We have cared about what distinguishes the animate from its simulacra since at least as far back as Descartes, but we have never needed the answers as we do now. In the last year, for instance: The Internet’s “Robot Exclusion Standard” turned ten years old, enforced only by etiquette. The Federal Trade Commission bankrolled a “Robocall Challenge” in the hopes of stemming the tide thereof, awarding twenty-five thousand dollars to a company called “Nomorobo.” The former CTO of Pixar’s new company released its first product: an iPad app for elementary schoolers whose characters are chatbots powered by the cloud. The makers of the most popular video game of all time won a seven-million dollar lawsuit against a firm making bots that play said game. The big shockwave in experimental literature came when a seemingly automated Twitter account was revealed to be a person. And the Oscar for Best Screenplay went to the tale of a man who separates from his wife and falls—quite sympathetically—in love with his digital personal assistant.
A Google search for “plagued by bots” in 2014 turns up some twenty-five thousand results, largely from online gamers. Broadening the search, we confront story after story of the scourge of Twitter bots, Facebook bots, Tinder bots. Spike Jonze, speaking of his initial chatbot conversations that inspired Her, recalled the feeling of “trippy” verisimilitude, followed by the disappointment of seeing the program for what it was. That thrill and disappointment has, for most of us, worn off at this point. In its place is a kind of harried vigilance.
The great irony of the recent history of the Turing test is that we’re blurring the line between man and machine from both sides: deliberately meeting the machines halfway. Paul Ekman, arguably the world’s top expert on deceit, notes straightforwardly that “the more words spoken the better the chance of distinguishing lies from truthfulness.” I might adjust that maxim only to generalize it: from wpm to bitrate. The progression of technology for the last several decades has led us, counterintuitively, toward lower-bandwidth forms of connection. Calling, for example, used to mean showing up at someone’s house; now we regard even its disembodied modern version as invasive. “David [Foster Wallace] may have been the last great letter writer in American literature,” writes his biographer D. T. Max—“with the advent of email his correspondence grows terser, less ambitious.” And “terser, less ambitious” is precisely the direction in which we have moved away from e-mail. We are, in effect, fighting with our hands tied behind our backs: rather, eight of our ten fingers tied, wearing a gag—and a mask.
Our best weapons are the oldest. There are forty-three muscles in the human face. There are hundreds of ways to read almost any sentence aloud. The human retina is estimated by UPenn Medical School researchers to transmit some ten million bits of information to the brain every second. A text message filled to the brim contains eleven hundred twenty. The Turing test may, for its purposes, push intelligence through a straw: the good news is we’re not obliged to drink from it.
Brian Christian is the author of The Most Human Human.
Last / Next Article
Share