Assigning emotional values to words.
Last month I wrote about Matthew Jockers’s research on the shapes of stories, which has since met with a welter of reactions within and without academe. His critics ask two questions, essentially: Is it really possible to assign every word a reliable emotional valence? And even if the answer is yes, can we really claim that all the plots in the history of literature take so few basic forms?
A rough primer: Jockers uses a tool called “sentiment analysis” to gauge “the relationship between sentiment and plot shape in fiction”; algorithms assign every word in a novel a positive or negative emotional value, and in compiling these values he’s able to graph the shifts in a story’s narrative. A lot of negative words mean something bad is happening, a lot of positive words mean something good is happening. Ultimately, he derived six archetypal plot shapes. (To his credit and my chagrin, he’s refrained from giving them catchy names.) Here’s an example:
Annie Swafford, a professor at SUNY New Paltz, found some problems with Jockers’s methods. For one, his program couldn’t always detect the beginnings and ends of sentences; more gravely, it sometimes bungled or oversimplified the emotions of the sentences it read. “I am extremely happy today” and “There is no happiness left in me,” for example, it read as equally positive. And, as she points out,
Longer sentences may be given greater positivity or negativity than their contents warrant, merely because they have greater number of positive or negative words. For instance, “I am extremely happy!” would have a lower positivity ranking than “Well, I’m not really happy; today, I spilled my delicious, glorious coffee on my favorite shirt and it will never be clean again.”
“Frankly, I don’t think any of the current sentiment detection methods are especially reliable,” Jockers wrote in a response. “Sentiment is a subtle and nuanced thing … We would probably find a good deal of human agreement when it comes to the extremes of sentiment, but there are a lot of tricky cases, gray areas where I’m not sure we would all agree.”
And those gray areas, Swafford fears, comprise much of literature. No sentiment-analysis tool would know what to make of a sentence such as “Well, it’s like a potato,” which could, depending on context, have a positive or negative spin—I like potatoes, but there are few things in my life I’d favorably compare to one. “Our evaluations are pure guesswork,” Swafford writes: “we hope that phrases like ‘not good’ and ‘like a potato’ don’t happen too often; we hope that sarcasm and satire are infrequent enough.”
I’d written in my last post that “most words have a fairly straightforward sentimental value.” A commenter on Twitter said, “You can’t really believe that’s how language works.” But I do. For most words—lamppost, run, zoological—that value is zero or slightly variable. I think almost every English speaker would agree, though, that bad has a negative sentimental value. Yes, Michael Jackson called one of his albums Bad, and his use was predicated on the exact opposite value—but there’s no one using bad as slang for “really good” who’s ignorant of the word’s traditional valence. (Thankfully, hardly anyone is using bad as slang at all anymore.) Likewise, if I tell my friend, with a sardonic edge to my voice and tears streaming down my face, that I’m doing just great, his sense of great’s general positivity won’t be shaken to its very core. He’ll just know that I’m being glib.
Still, these instances of irony and tone raise major concerns about sentiment analysis. In a series of tweets, Jacob Eisenstein pursued an especially astute line of questioning:
True, sentiment analysis is not perfect, but “perfect” implies that scoring the sentiment of every part of a novel is a meaningful goal. Worrying about negation, and even irony and satire, implies that true sentiment is waiting to be discovered, if we could solve these problems. Does positive sentiment mean things are good for the protagonist? For the rest of the fictional world? For the reader?
For the would-be sentiment analyst, these look like tall hurdles, but they might have answers that we could readily agree on. “True sentiment” is, beneath the veneer of irony, “waiting to be discovered”; assuming we can develop an algorithm sophisticated enough to discern ordinary, stable badness from your Michael Jackson–type Badnesses, it wouldn’t be prohibitively difficult to account for writers’ irony.
And when we track “positive sentiment,” we do mean, I think, that things are good for the protagonist or the narrator. There are, of course, plenty of novels where the hero is so inured to contemporary life that almost nothing is positive or negative. There are novels where the hero is so self-loathing that he considers it good when bad things happen to him. And there are novels with so many changes in perspective, at such a remove from conventions in point of view, that it would be ridiculous to claim that they have protagonists at all. But they’re all affixed, however tenuously, to concepts of sentiment—we’re always rooting, somehow, for someone or something. And if none of these forms are outside the grasp of a good reader, why would they elude a good algorithm?
Jockers has yet to publish his official research paper, so this discussion is deeply preliminary*. Most academics seem to be watching with cautious optimism. But in the wings there are two factions forming. First are those who fear that the entire edifice of literature has crumbled before the shapely contours of data, who regard all work in the digital humanities as trivial—who maintain that the best fiction is of such consummate artfulness and sensitivity that any attempt to find patterns in it is crass and ignorant, a fool’s errand. And I sympathize. I don’t think emotions can be fully quantified. The experience of being alive does not reduce neatly to a formula. But if “art breaks the sea that’s frozen inside us,” as Kafka wrote, it does so by appealing to our feelings, and these, like it or not, have limits, albeit hazy ones. If every sentiment were unrecognizably unique, no one would have any intuition for what someone else was feeling; art would be useless. If we can agree that literature speaks to certain baselines of positive and negative—feeling good, feeling bad, feeling alive or dead—then there’s a firm basis for all kinds of data-driven research. To claim otherwise would be precious.
On the other hand, even if we were to induce—with 99 percent certainty and the most cunning algorithms in history—six fundamental shapes for literary plots … what then? Would writers tear their hair out trying to invent new ones? Would students arrive at a more refined understanding of how stories work, or why we’re so intent on telling them? Would Madame Bovary and Middlemarch unfurl before us with unseen glories? There’s a lot that’s instructive in projects like Jockers’s, but what a blinkered existence it would be to pursue them exclusively. You’d risk winding up with a rigid, holistic data set that knows everything about literature except why we want to read it.
Dan Piepenbring is the web editor of The Paris Review.
*My thanks to Eileen Clancy, who told me about the reactions to Jockers’s research and who published this timeline to clarify those reactions. I should also note that I’ve completely omitted many of the facets in these arguments—mostly a discussion of graph foundations involving such things as ringing artifacts and Gaussian filters—because they’re beyond my ken.