Turning novels’ plots into data points.
Motherboard has a new article about Matthew Jockers, a University of Nebraska English professor who’s been studying what he calls “the relationship between sentiment and plot shape in fiction.” Jockers has crunched hard data from thousands of novels in the hope of answering two key questions: Are there any archetypal plot shapes? And if so, how many?
The answers, his data suggest, are “yes” and “about six,” respectively.
Jockers, it should be clear, is pursuing a different meaning of plot than the one we conventionally reach for—he conceives of it as an emotional concern more than a narrative concern. His research was spurred by a concept called syuzhet, one of a pair of terms coined by the Russian formalist Vladimir Propp. As Jockers explains,
Syuzhet is concerned with the linear progression of narrative from beginning (first page) to the end (last page), whereas fabula is concerned with the specific events of a story, events which may or may not be related in chronological order … When we study the syuzhet, we are not so much concerned with the order of the fictional events but specifically interested in the manner in which the author presents those events to readers.
In other words, a book’s plot isn’t necessarily about conflict and resolution, but emotions, which “serve as proxies for the narrative movement,” as Jockers writes. This is an attractive approach to plot, in part because it allows us to ascertain—and to defend, if need be—the “plottiness” of certain books that tend to be regarded as plotless. It’s become conventional wisdom that plot, and the active enjoyment of it, are middlebrow pursuits, and that true literature is free from the shapely confines of narrative. This ignores that intricate, careful plotting is itself an art form, and that what makes so much “literary” fiction so ungodly boring is its inept or absent plotting. But by Jockers’s conception, even Waiting for Godot, which Vivian Mercier famously and favorably described as “a play in which nothing happens, twice,” is positively brimming with plot: chase sequences, surround-sound explosions, incestuous love triangles.
So how does Jockers get at the syuzhet, the emotional silhouette of a book’s body? He hasn’t revealed all of his methodology yet, but in essence, he’s using sentiment analysis, looking at the mood and tone of writers’ words—specifically “a controlled vocabulary of positive and negative sentiment markers … and a machine model that I trained to identify and score passages as positive or negative.”
I found this objectionable at first—the idea that something as complex as plot could be reduced, or even partially reduced, to a list of words with emotional valences. But there are only so many words, really, and though it may seem to flatten the whole transfigured human experience, most words have a fairly straightforward sentimental value. Here’s a random segment of the list of negative words Jockers drew from, for instance:
And the positive ones:
Jockers was inspired, too, by Kurt Vonnegut, who said in his rejected master’s thesis in anthropology for the University of Chicago, “There’s no reason why the simple shapes of stories can’t be fed into computers.” There are probably still writers who find that statement provocative. I don’t. It should be obvious to all writers that parts of “the craft” are deeply schematic; if you feel threatened by a machine, there’s probably something suspect about your humanism. We should resist the precious notion that there’s something inimitable about the whole enterprise of storytelling. Vonnegut definitely did: he found that stories tended to have eight shapes, including a kind of U-shaped one he called Man in Hole in which “somebody gets into trouble, gets out of it again.” He also readily admitted that Hamlet’s shape was more or less a flat line—and that it was brilliant nonetheless.
Jockers’s research has led him to a smaller number, six, and though he hasn’t told us what those six are, you can see already that he’s onto something. Here’s a chart tracking the shifts in emotional valence of Joyce’s Portait of the Artist:
And here’s The Da Vinci Code:
“Notice how much more regular the fluctuations are. This is the profile of a page turner,” he writes. “Dan Brown never lets the plot become too troubled or too much of a downer. He baits us and teases us with fluctuating emotion.”
I worked briefly for a film and theater producer who liked to speak of “pushing plot through” a script, as if plot were a waste product excreted through the screenplay’s GI tract. That makes the process, and the producer, sound crass and uncultured, but it was just the opposite: I can think of hardly anyone (certainly not most writers) who had such a sophisticated understanding of plot, who could see a story beat by beat. If yours wanted for drama, if its characters sometimes lacked motivation or some aspect of its narrative felt contrived, he could fix it, and not by cheapening or sensationalizing the plot, but by showing you, on your own terms, what the stakes really were. Tellingly, he eschewed the term plot arc, which summons a tidy rainbow of conflict and resolution, in favor of that uglier but truer “pushing plot through”: what a compelling story needs isn’t a clean trajectory but a healthy appetite and a good digestion.
As a reader and a writer, I’ve never found it easy to discuss plot; my alliances have never been clear. I have a deep admiration for writers like John le Carré and Richard Price, who work unashamedly in genre fiction but whose novels, in part because of their convincing, complicated plots, are elevated to literature; I also like writers whose fiction seems to verge on total plotlessness (Nicholson Baker, Renata Adler), and writers who self-consciously mock the very struts and joists of plot (Donald Barthelme, Mark Leyner). And these knee-jerk reactions are complicated by research like Jockers’s, which suggests that plot is at once more subtle and more obvious than we’d expect—less a product of preconceived conflict and more indebted to a writer’s style and characters. Never has the question “What happens in it?” been more vexed.
It’s easy to mock what have emerged as the digital humanities, because they seem to partake of a lust for data, of data for data’s sake, that feels at the moment like an unseemly trend. But anyone who asks what use these charts could possibly have, what “practical value,” has already fallen into the trap that students of the humanities are supposed to avoid. Jockers (and others like Ben Schmidt, who has a fascinating analysis of the structures underlying thousands of TV and movie scripts) have already gotten us to ask questions about the foundations of plot—about what a plot is, even. I’ve come out of his work feeling that I know even less than I knew before: the mechanisms of storytelling are more ambiguous than ever. That in itself gives his work value.
Dan Piepenbring is the web editor of The Paris Review.