On The Need For Understanding

I saw this Mastodon post from Andy Wingo recently:

in these days of coding agents and what-not, i often think of gerald sussman's comment that one no longer constructs systems from known parts, that instead one does basic science on the functionality of foreign libraries; he was right then and i hate it as much as i did 16 years ago

I started drafting a reply, but it quickly began to spiral into a full-blown essay. It turns out that I have a lot of related thoughts that I've been meaning to get down for a while.

From the linked blog post:

... Costanza asked Sussman why MIT had switched away from Scheme for their introductory programming course, 6.001. This was a gem. He said that the reason that happened was because engineering in 1980 was not what it was in the mid-90s or in 2000. In 1980, good programmers spent a lot of time thinking, and then produced spare code that they thought should work. Code ran close to the metal, even Scheme — it was understandable all the way down. Like a resistor, where you could read the bands and know the power rating and the tolerance and the resistance and V=IR and that's all there was to know. 6.001 had been conceived to teach engineers how to take small parts that they understood entirely and use simple techniques to compose them into larger things that do what you want.

But programming now isn't so much like that, said Sussman. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don't know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job, and it needed a different course.

(For a primary source, here's a video of Sussman answering the same question at a different event, starting at 59:35.)

I have really weird feelings about this assertion. It is obviously true that the software stack that most of the world runs on is a towering mess of leaky abstractions, a sprawling mass of haphazardly-interconnected components that nobody could claim to fully understand. It's also often the case that documentation is wildly inadequate, and experiment can sometimes be the simplest route to clearer understanding.

But the implication is almost that we can't understand the components we're building on top of, that this way of working died in the 90s and now the job of programming is overwhelmingly just poking at stuff until it seems to work, and that's just the complete opposite of my personal experience.

I was born in 1982, and started programming pretty much as soon as I could read. For many years starting out I really didn't understand how anything worked, and things got more complicated much faster than I could make sense of them. I started out programming BASIC on 8-bit computers like the VIC-20 and Apple II; programming meant sequencing commands that made Things Happen, and if BASIC didn't have a command for it, I couldn't do it. I knew assembly language existed, but that was a different universe; when I would occasionally find myself dropped into the Apple II machine code monitor, the only thing I could figure out to do with it was frantically mash Ctrl+Reset to attempt to get back to a BASIC prompt again.

Later I got a 286, and I found myself straining against the limitations of QBASIC. It had more commands, which meant I could do more things, but it wasn't enough for me. I managed to get my hands on a copy of Microsoft QuickC, assuming that my experience with QBASIC would transfer over to this more powerful language that surely had far more commands that would let me do more things, but was dismayed to discover that the help files somehow didn't reveal any graphics commands at all!

Eventually I got access to BBSes, and found some programming tutorials that explained how VGA worked, and even got my hands on a shareware BASIC compiler that let me try some of these ideas out. I still didn't feel like I really understand what was going on, but I had names for concepts at least, and I was able to get pixels on the screen. I even wrote a terrible paint program to create those pixels with, because parsing real graphics formats was beyond me, but I could manage “width, height, pixels”.

Then I got a 486 and the internet and suddenly if I wanted to use any newer tools, none of the pieces I had struggled so hard to understand worked anymore. The old stuff was still there, but it was buried under a whole new layer of complexity. I spent so much time totally lost, my accumulated knowledge repeatedly crumbling to dust, the Next Important Thing I was supposed to learn hopelessly out of reach. I would get books on assembly language or win32 and they would fill me with despair. I would read people posting in newsgroups about DPMI, and how you couldn't make BIOS calls without switching between real mode and protected mode, and wonder how the fuck anyone was supposed to get a pixel on the screen.

This was the mid-90s; about the timeframe when Sussman would have stopped teaching 6.001. The paraphrasing of the blog post is a bit less clear, but in the video Sussman is explicit that this is exactly the timeframe where he saw engineering changing in fundamental ways. This was not easy to deal with as a teenager who was trying to learn this stuff!

All I wanted in the world was to mess around with stuff until it sort of worked. Understanding was for chumps. I didn't want to think about the problem space, or the messy realities of my platform of choice. I wanted the computer to Do Thing. I wanted libraries and languages with a simple face, that would solve problems for me without me having to think them through, because the amount of shit that didn't make sense to me was so overwhelming. I just wanted it to be easy.

I would like to share with you a personal accomplishment from this era. I was proud of it then. I am not proud now, but remembering it helps me remember what things were really like for me, when I didn't have the skills and mindset I have now.

DJGPP and Allegro had gotten me back into the comfortable world of using Commands that Did Things to make games. I was using Scream Tracker to write music, and I wanted to use the popular Mikmod library to put that music into my games. There was a companion library called “Mikalleg” which took care of connecting the sound output routines to Allegro's audio engine, and did some preprocessor trickery to work around the fact that both Mikmod and Allegro defined a struct called SAMPLE.

I found it all kind of complicated to use. Mikmod / Mikalleg tended to provide flexible APIs that had a lot of confusing parameters. I didn't want to have to understand what all that stuff was; I couldn't understand why they didn't just give you a function that started playing the music you told it to play. So I figured out some values that worked, and wrote wrapper functions that removed all that flexibility and just did exactly the thing that I assumed everyone would want.

This wasn't my only innovation. Mikalleg didn't interoperate with Allegro's facility for compressed data files. I wanted it to. (Real Games didn't just leave music files lying around that you could easily listen to or modify!) Mikmod had a generic I/O interface, so that you could override the file loading routine by providing “read” and “write” calls. It looked complicated, so I ignored it. Instead I wrote a function that would take the data in the Allegro file – which would have been decompressed and loaded into RAM at this point – write it to a file on whatever disk you happened to be using, tell Mikmod to load the file, and then delete it. I knew this probably wasn't an ideal solution, but gosh, it worked great! And it was so easy to do! Why would I ever want to spend so much effort to do things “right”?

I proudly packaged this all up into a library I called “Easymik”. I made a webpage for it and everything. The Mikalleg author even prominently linked to it from the Mikalleg home page! People were desperate for a solution to this datafile problem, and I built one!

I wanted, so badly, to be a Real Programmer. I knew what tools Real Programmers used; they used C and assembly. I knew what Real Programs looked like; there was an .exe file, and DOS4GW for some reason, and special data files in special formats, never .S3M files that I could listen to, never .PCX files that I could open in my paint programs, never text files that I could look at and edit. I didn't know why things were this way, just that they were, so the closer I could get to that, the more Real my programs became, the more likely it was that I would be Good At It, that I could make the computer do anything and everything I could imagine.

If you had given me a magic box that I could ask to write programs for me, that generated code that I didn't understand, that sort of worked but might have weird problems, that I could pester with questions about esoteric technical subjects until it gave me reasonable-sounding-but-maybe-wrong answers that were on my level, I would have been delirious with joy. I would have shaken the devil's hand, weeping with gratitude, and leapt face-first into vibe coding with a ferocity you could scarcely imagine. Sure, it's a bit shit, but all of the resources I had access to were shit.

I could have gotten stuck that way. I could have flailed around, not understanding, not wanting to understand, for many more years than I did. But I think there were a few events sent me on a better trajectory.

The first event was that I improbably landed a programming job working with QNX while I was still in high school. By that time I'd learned enough C to be dangerous; I'd mostly figured out how to use pointers, and could use malloc and free even though I couldn't have told you the difference between the stack and the heap. (I have a vivid sense memory of the panic and terror I felt when a coworker casually dropped one of those terms while talking to me about something.)

QNX was everything MS-DOS and Windows were not; clear, elegant, understandable, reliable. I read Rob Krten's book and said to myself, “of course this is how an operating system should work.” I never managed to write a win32 program, but I could wrap my head around Photon. It proved to me that things didn't need to be so awful; that you could make a great many things wildly easier to understand simply by finding a good design.

The only problem was I didn't know how to find a good design. I wanted to know, so badly, but the advice I was reading didn't make any sense to me; lots of people just asserting that you should build things a certain way, follow some arbitrary rules, often contradicting each other, usually in vague terms I didn't really understand. I got very interested in alternative programming languages and tools – DSLs, Lisp, Forth – stuff that had a tangible form, but was spoken about as though it was magic. Surely they'd cracked the problem.

At my next job, I felt like I was ready to try out some of the concepts I was reading about. I had just finished the unenviable task of patching dozens of bespoke tools to support a new peripheral. These tools had all been haphazardly built by cloning the last one and swizzling its code to support the new product; at its core there were really only two use cases. I proposed a single, unified codebase to replace them all. I spent months building a generic tool-building toolkit, infinitely configurable via XML. Solving 80% of the problem was reasonably straightforward, but the last 20% had me doing absurd contortions like defining a bidirectional algebra for specifying the structure of a serial number string. I was undaunted; I powered through and ended up with something that worked pretty well for the first use case, even if writing the XML was a little bit tricky sometimes.

When I extended the system to handle the second use case, though, it became apparent that I had begun to run afoul of Greenspun's tenth rule. I had defined a dynamic Value class that could contain Anything, so that I could dynamically define the data schemas for binary log files in XML, rather than C++. This ended up taking a few hundred kilobytes of logs, and running my beefy development PC completely out of RAM just trying to parse them. It was a fiasco.

I eventually managed to optimize it enough that it could ship; IIRC, I ended up giving up and hardcoding the schema. The way I'd constructed everything, this meant that everyone touching the one unified generic codebase would have to add product-specific code anyway, exactly the thing I worked for months to avoid. It was obviously inferior to its replacement. I'd worked so hard to do the elegant thing, to build a reusable system out of simple, beautiful pieces, and I had produced a huge blob of pointless complexity which had only made life harder for myself and whoever would have to work with it next.

Everything still felt so hard, but I was more convinced than ever that the secret to making everything easier was out there, that better tools were the answer. Propelled by a conviction that domain-specific languages were going to be the answer to everything, I moved thousands of kilometers away, temporarily immigrated to the United States, and landed an even more unlikely job where a billionaire paid me to fail to fix a tree-merging algorithm for five years.

It turns out that if you are not thinking about things the right way, it is possible to spin your wheels struggling with a single problem for a very, very, very long time.

It was at that job that, for the first time, I finally confronted the fear that had been silently driving me throughout my career – the fear of truly digging into the problem. The fear of admitting when I don't really understand something. The fear that it will all be too complicated and overwhelming to deal with. I realized, finally, that what I had secretly been yearning for all along, the dream I had spent years of my life trying to realize, that I uprooted my life to pursue, was some magical tool, some impossible technique, that would free me from the need to learn things that felt too hard and scary.

But not learning? Trying to build something without understanding it? It turns out that's so much harder.

Nancy comic strip. Panel 1: Nancy at her desk, scowling, thinking: "How can I get out of having to think hard?" Panel 2: Nancy walking home, brows furrowed, thinking: "How" Panel 3: Nancy lying in bed, on top of the covers, staring at the ceiling: "How"

You must understand a problem before you can solve it. That lesson is core to everything I have done since. If a component isn't working the way you expect, you have to dig in and figure out how it actually works, what it's actually doing, or you have just given yourself a problem that you refuse to understand. If the problem is of any importance, then that is a terrible mistake. I don't want to waste years of my life spinning my wheels like that ever again.

Everything gets easier once you commit to understanding how things work. The more you do it, the easier it gets. The more you learn, the more you understand, the more you can accomplish. This is the magical tool I spent half my career searching for.

I printed out the checklist from Polya's How To Solve It and started referring to it whenever I sat down to work. I never read the rest of the book. It's just a handful of questions to ask yourself, some suggestions to help make sure you have a handle on things, but it's powerful stuff. I had been slowly making progress with the tree-merging algorithm, finding bugs with an ad-hoc randomized testing system (basically QuickCheck without a shrinking pass, so every failed test case was a monster that took days to untangle). I finally got things to a point where I wasn't hitting data loss or crash bugs, but the output was still pretty far from ideal. I finally confronted some core assumptions about the tree-merge algorithm that the codebase had started with, and which I had never felt comfortable enough to question, and realized that they had been pointlessly making my job wildly more difficult. I tore the whole thing apart, the fragile thing I had spent years patching, the thing I had finally managed to get sort-of working, and rebuilt it from scratch.

It worked flawlessly.

The main thing I now bring to the table, as a software professional, is the ability to understand what's really going on. Sometimes, yes, this involves doing basic science to answer a question, but I usually find it hard to be satisfied with an answer until I have gone down to the source code of the confusing component to really understand it. This is an unreasonably effective skill that has only become more relevant over the years, not less.

Sussman correctly identifies the 90s as a time when complexity exploded in engineering, both electrical and software. But I'm not willing to grant that we have been entirely unsuccessful at taming the complexity in the intervening decades. You will never convince me that the state of affairs now, from a programmer's perspective, is worse than writing Windows 95 apps in C. The huge thing that separates the complexity of the 90s from the complexity we have now was that, in the 90s, the inner workings of everything was secret. You didn't get the source code to Windows 95, you programmed to the docs, and when it had bugs, or didn't behave the way the docs said it did, or you just didn't understand how it was supposed to work, there was virtually nothing else you could do but poke at your code until it stopped triggering the problem. Breaking out the kernel debugger and reverse engineering your operating system is not a viable approach for most people!

But this is not the same situation software developers are in today. Yes, we have huge libraries, more now than ever! But most are open source; when they don't behave as we expect, it is orders of magnitude easier to figure out why than it used to be. Most are more reliable than they were then, either by being written in languages that make writing reliable software easier, or through simply having been around for decades, with most major bugs slowly killed off through sheer force of attrition. Components worth using in 2025 usually have a coherent way in which they are designed to work, they will usually work in that way, and it is usually possible to learn what that is. There is still plenty of garbage out there, of course – you should choose your dependencies carefully! – but you are absolutely spoiled for choice in comparison to the 90s.

I did some Android UI programming about a decade ago now. I would write custom View subclasses from time to time, and every so often I would struggle with bugs where the UI layout would need to be recomputed, but it just wouldn't happen. The documentation was useless. I would try to sprinkle around even more calls to requestLayout or forceLayout, but nothing seemed to help.

Finally I got fed up and dug into the source. Turns out the way it works under the hood is that requestLayout and forceLayout just set a few flags on the view object; they don't “schedule” anything, contrary to the docs. requestLayout recursively calls itself on the parent, which is what gives the signal to Android the next time it goes to draw the screen that some stuff needs laying out again. forceLayout sets the same flags, but only on itself.

I don't know what actual use forceLayout is meant to have, because in practice, what it actually means is “if one of my descendants calls requestLayout, stop propagating that flag upwards once you reach me”. It's a method whose sole purpose seems to be to create layout bugs that are impossible to track down. I also remember there being a data race, where if you called one of those functions while layout was happening, the flags of the various views in the tree could get in funky inconsistent states, and everything would get screwed up.

After I read the code, I finally understood how Android's layout algorithm actually worked – exactly when measurement passes ran, what was triggering them, what I needed to avoid doing in them. I could rework my custom views so they wouldn't trip this problem anymore. I understood weird behaviour I had experienced in other situations. Running experiments, poking things, doing science – that got me brittle, buggy code that took forever to get working and that I was afraid to mess with. Opening up the black box gave me confidence.

It is still possible to build a complex piece of software that works out of simpler pieces of software that work. You can choose dependencies that are understandable. Programming in 2025 does not have to be about fumbling in the dark. Everything gets easier if you're willing to turn on the light.