The Beauty of Code

Dependency diagram (Image via TheDailyWTF)

This is what ugly code looks like. This is a dependency diagram—a graphic representation of interdependence or coupling (the black lines) between software components (the gray dots) within a program. A high degree of interdependence means that changing one component inside the program could lead to cascading changes in all the other connected components, and in turn to changes in their dependencies, and so on. Programs with this kind of structure are brittle, and hard to understand and fix. This dependency program was submitted anonymously to TheDailyWTF.com, where working programmers share “Curious Perversions in Information Technology” as they work. The exhibits at TheDailyWTF are often embodiments of stupidity, of miasmic dumbness perpetrated by the squadrons of sub-Mort programmers putting together the software that runs businesses across the globe. But, as often, high-flying “enterprise architects” and consultants put together systems that produce dependency diagrams that look like this renowned TheDailyWTF exhibit. A user commented, “I found something just like that blocking the drain once.”

If that knot of tangled hair provokes disgust, what kind of code garners admiration? In the anthology Beautiful Code, the contribution from the creator of the popular programming language Ruby, Yukihiro “Matz” Matsumoto, is an essay titled “Treating Code as an Essay.” Matz writes:

Judging the attributes of computer code is not simply a matter of aesthetics. Instead, computer programs are judged according to how well they execute their intended tasks. In other words, “beautiful code” is not an abstract virtue that exists independent of its programmers’ efforts. Rather, beautiful code is really meant to help the programmer be happy and productive. This is the metric I use to evaluate the beauty of a program.

He goes on to list the virtues of good code: brevity; reusability (“never write the same thing twice”); familiarity (Ruby is an “extremely conservative programming language” that does not use “innovative control structures” but “sticks to traditional control structures programmers are familiar with”); simplicity; flexibility (which Matz defines as “freedom from enforcement of tools,” so programmers aren’t forced to work in a certain way by the tools or languages they use); and, finally, balance: “No element by itself will ensure a beautiful program. When balanced together and kept in mind from the very beginning, each element will work harmoniously with the others to create beautiful code.”

So, beautiful code is lucid, it is easy to read and understand; its organization, its shape, its architecture reveals intent as much as its declarative syntax does. Each small part is coherent, singular in its purpose, and although all these small sections fit together like the pieces of a complex mosaic, they come apart easily when one element needs to be changed or replaced. All this leads to the happiness of the programmer, who must understand it, change it, extend it. This longing for architectural coherence leads to comparisons of code with music, which is often described as the most mathematical of the arts. There is, in fact, an anecdotal but fairly generalized belief among American programmers that there is a high correlation between coding and music-making, that many coders are musicians. A similar claim is made about mathematicians and music. These connections seem culturally encoded to me, specific to America—I’ve never heard of Indian programmers or mathematicians having a special affinity for music, apart from some being passionate listeners. Still, the code-and-music analogy is illuminating in that both practices prize harmonious pattern-making and abhor cacophony, a loss of clarity and structure. The snarl in the dependency diagram above may strike the civilian as a pretty picture, with its swirl of lines and punctuating sparks of gray; to the programmer, it is an abomination because it speaks of incoherence, incomprehensibility, unpredictability, sticky seams of connection that prevent swift diagnosis and make excision and replacement all but impossible.

With his emphasis on programmer happiness, Matz makes explicit his allegiance to Donald Knuth’s literate programming. He writes:

Programs share some attributes with essays. For essays, the most important question readers ask is, “What is it about?” For programs, the main question is, “What does it do?” In fact, the purpose should be sufficiently clear that neither question ever needs to be uttered … Both essays and lines of code are meant—before all else—to be read and understood by human beings.

The trouble of course is that as software programs grow bigger and more complex, the code they comprise tends to become unreadable and incomprehensible to human beings. Programmers like to point out that if each line of code, or even each logical statement (which may spread to more than one physical line), is understood to be a component, software systems are the most complicated things that humans have ever built: the Lucent 5ESS switch, used in telephone exchanges, derives its functionality from a hundred million lines of code; the 2008 Fedora 9 distribution of Linux comprises over two hundred million lines of code. No temple, no cathedral has ever contained as many moving parts. So if you’ve ever written code, you understand in your bones the truth of Donald Knuth’s assertion, “Software is hard. It’s harder than anything else I’ve ever had to do.” If you’ve ever written code, the fact that so much software works so much of the time can seem profoundly miraculous.

* * *

The International Obfuscated C Code Contest annually awards recognition to the writer of “the most Obscure/Obfuscated C program”— that is, to the person who can produce the most incomprehensible working code in the language C. The stated pedagogical aim of the contest is “to show the importance of programming style, in an ironic way.” But it has always seemed to me that confronting unfathomable code is the programming equivalent of staring at the abject, of slowing down to peer into the carnage of a car wreck. This is the reason that programmers expend time and effort in designing esoteric, purposely difficult computer languages like the infamous “brainfuck”—that really is its official name, with the lowercase b—which was created as an exercise in writing the smallest possible compiler (240 bytes) that could run on the Amiga operating system. “Hello, world!” in brainfuck is:

++++++++++[>+++++++>++++++++++>+++<<<-]>++.>+.+++++++ ..+++.>++.<<+++++++++++++++.>.+++.——.——–.>+.

Brainfuck is a “Turing tarpit,” which is to say it is a very small language in which you can write any program that you could write in C or Java; but attempting to do so would, well, fuck your brain, and therefore the delectable frisson of terror the code above induces in discerning code cognoscenti. brainfuck is venerable and famous, but my favorite esoteric language is Malbolge, which was designed solely to be the most outrageously difficult language to program in. It is named, appropriately, after the eighth circle of hell in Dante’s Inferno, Malebolge (“Evil ditches,” reserved for frauds). In the language Malbolge, the result of any instruction depends on where it is located in memory, which effectively means that what specific code does changes with every run. Code and data are very hard to reuse, and the constructs to control program flow are almost nonexistent. Malbolge inverts the sacred commandments of literate programming, and is so impenetrable that it took two years after the language was first released for the first working program to appear, and that program was not written by a human, but generated by a computerized search program that examined the space of all possible Malbolge programs and winnowed out one possibility. “Hello, world!” in Malbolge is:

(=<`$9]7<5YXz7wT.3,+O/o’K%$H”‘~D|#z@b=`{^Lx8%$Xmrkpohm- kNi;gsedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543s+O<oLm

And yet, this snippet doesn’t convey the true, titillating evil of Malbolge, which changes and quakes like quicksand. To contemplate Malbolge is to stare into the abyss in which machines speak their own tongues, indifferent to the human gaze; the programmer thereafter knows the pathos of her situation, and recognizes the costs of sacrilege. The coder’s quest is for functionality—“all computer programs are designed to accomplish some kind of task”—and the extension and maintenance of that functionality demands clarity and legibility. Illegibility, incomprehensibility—that way madness lies.

* * *

Programmers work doggedly toward correctness, but the sheer size and complexity of software ensures that bugs lurk within. A bug is, of course, a flaw or fault in a program that produces unexpected results. In 1986, the award-winning researcher and academic Jon Bentley published a book that is now widely regarded as a classic, Programming Pearls. One of the algorithms he implemented was for binary search, a method of finding a value in a sorted array, which was first published in 1946. In 2006, decades after the publication of Bentley’s book—by which time his particular implementation had been copied and used many thousands of times—one of his erstwhile students, Joshua Bloch, discovered that under certain conditions this technique could manifest a bug. Bloch published his finding under a justly panic-raising headline, “Extra, Extra—Read All About It: Nearly All Binary Searches and Mergesorts are Broken.” He wrote:

The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.

Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it.

That software algorithms are now running our whole world means that software faults or errors can send us down the wrong highway, injure or kill people, and cause disasters. Every programmer is familiar with the most infamous bugs: the French Ariane 5 rocket that went off course and self-destructed forty seconds after liftoff because of an error in converting between representations of number values; the Therac-25 radiation therapy machine that reacted to a combination of operator input and a “counter overflow by delivering doses of radiation a hundred times more intense than required, resulting in the agonizing deaths of five people and injuries to many others; the “Flash Crash” of 2010, when the Dow Jones suddenly plunged a thousand points and recovered just as suddenly, apparently as a result of automatic trading programs reacting to a single large trade.

These are the notorious bugs, but there are bugs in every piece of software that you use today. A professional “cyber warrior,” whose job it is to find and exploit bugs for the U.S. government, recently estimated that “most of the software written in the world has a bug every three to five lines of code.” These bugs may not kill you, but they cause your system to freeze, they corrupt your data, and they expose your computers to hackers. The next great hope for more stable, bug-free software is functional programming, which is actually the oldest paradigm in computing—it uses the algebraic techniques of function evaluation used by the creators of the first computers. In functional programming, all computation is expressed as the evaluation of expressions; the radical simplicity of thinking about programming as only giving input to functions that produce outputs promotes legibility and predictability. There is again the same fervent proselytizing about functional programming that I remember from the early days of object-oriented programming, the same conviction that this time we’ve discovered the magic key to the kingdom. Functional languages like Clojure conjure up the clean symmetries of mathematics, and hold forth the promise of escape from all the jugaadu work-arounds that turn so much code into a gunky, biological-seeming mess. In general, though, programmers are now skeptical of the notion that there’s any silver bullet for complexity. The programmer and popular blogger Steve Yegge, in his foreword to a book called The Joy of Clojure, describes the language as a “minor miracle” and “an astoundingly high-quality language … the best I’ve ever seen,” but he also notes that it is “fashionable,” and that

our industry, the global programming community, is fashion-driven to a degree that would embarrass haute couture designers from New York to Paris … Fashion dictates the programming languages people study in school, the languages employers hire for, the languages that get to be in books on shelves. A naive outsider might wonder if the quality of a language matters a little, just a teeny bit at least, but in the real world fashion trumps all.

In respect to programming languages and techniques, the programming industry has now been through many cycles of faith and disillusionment, and many of its members have acquired a sharp, necessary cynicism. “Hype Cycle”—a phrase coined by the analysts at Gartner, Inc.—adroitly captures the up-and-down fortunes of many a tech fad.

Gartner, Inc.’s Hype Cycle (Jeremy Kemp, via Wikimedia Commons)

The tools and processes used to manage all this complexity engender another layer of complexity. All but the simplest programs must be written by teams of programmers, each working on a small portion of the system. Of course these people must be managed, housed, provided with equipment, but also their product—the code itself—must be distributed, shared, saved from overwriting or deletion, integrated, and tested.

Entire industries have grown around these necessities. Software tools for building software—particularly Integrated Development Environments, applications used to write applications—are some of the most complex programs being built today. They make the programmer’s job easier, but the programmer must learn how to use them, must educate herself in their idiosyncrasies and the work-arounds for their faults. This is not a trivial task. For example, every programmer needs to use a revision control system to track changes and easily branch and merge versions of code. The best-regarded revision control system today is Git, created by Linus Torvalds (and named, incidentally, after his famous cantankerousness). Git’s interface is command-line driven and famously UNIX-y and complex, and for the newbie its inner workings are mysterious. In response to a blog post titled “Git Is Simpler Than You Think,” an irritated Reddit commenter remarked, “Yes, also a nuclear submarine is simpler than you think … once you learn how it works.” I made three separate attempts to learn how Git worked myself, gave up, was frustrated enough by other revision control systems to return, and finally had to read a 265-page book to acquire enough competence to use the thing. Git is immensely powerful and nimble, and I enjoy using it, but maneuvering it felt—at least initially—like a life achievement of sorts.

You may have to use a dozen tools and websites to handle the various logistical aspects of software development, and soon the triumph starts to wear a little thin. Add another dozen software libraries and frameworks that you may use internally in your programs—again, each one comes bristling with its own eccentricities, bugs, and books—and weariness sets in. Each tool and preconstructed library solves a problem that you must otherwise solve yourself, but each solution is a separate body of knowledge you must maintain. A user named jdietrich wrote in a discussion on Hacker News:

My biggest gripe with modern programming is the sheer volume of arbitrary stuff I need to know. My current project has so far required me to know about Python, Django, Google App Engine and its datastore, XHTML, CSS, JQuery, Javascript, JSON, and a clutch of XML schema, APIs and the like …

Back in ye olden days, most programming tasks I performed felt quite natural and painless, just a quiet little chat between me and the compiler. Sometimes longwinded, sometimes repetitive, but I just sat and thought and typed and software happened. The work I do these days feels more like being a dogsbody at the tower of babel. I just don’t seem to feel fluent in anything much any more.

And every year, the new technologies arrive in a cloud of acronyms and cute names: MongoDB, HTML5, PaaS, CoffeeScript, TPL, Rx. One must keep up. On programmers.stackexchange.com, one hapless coder wrote:

I was humbled at a job interview yesterday almost to the point of a beat-down and realized that although I know what I know, my skills are pretty old and I’m getting to where I don’t know what I don’t know, which for a tech guy is a bad thing.

I don’t know if I can keep current just doing my day to day job, so I need to make sure I at least know what’s out there.

Are there well known blogs I should be keeping up with for software development?

The best—or at least the most ambitious—programmers read blogs and books, attend conferences to keep up with the state of the art, learn a new language every year or two. When you begin programming, one of the attractions is the certainty that you will never run out of things to learn. But after a few years of working in a corporate cubicle under exploitive managers, after one deadline too many, after family and age and a tiring body, learning the ins and outs of the latest library can feel like another desperate sprint on a nonstop treadmill. There is a reigning cult of overwork in the industry—the legend of the rock-star programmer usually has him coding sixteen hours a day, while simultaneously contributing to open-source projects, blogging, conferencing, and somehow managing to run a start-up—and this ideal has led many an aspirant to burn out, complete with techie thousand-yard-stare, clinical depression, outbursts of anger, and total disinterest in programming. This trough of disillusionment is so deep that for many, the only way to emerge from it is to leave the industry altogether, which rewards a few with fame and dazzling amounts of money, but treats the many as disposable cogs in its software production machine. The endless cycle of idea and action, endless invention, endless experiment, all this knowledge of motion takes its toll, leaves behind a trail of casualties.

Vikram Chandra is the author of three highly acclaimed works of fiction, most recently Sacred Games, which won the Hutch Crossword Award for Fiction in 2006. Chandra lives in Oakland and teaches at the University of California, Berkeley.

Advertisement

The Beauty of Code

On Language

Advertisement

Sign In

Sign In

On Language