Data is code
I've been seriously writing Forth, with my homebrew Forth dialect, for about a year now, off and on, and I've noticed something interesting with how things end up structured.
Forth and Lisp are often spoken of as though they are similar in some deep way. In Lisp circles, you often hear “code is data.” This is generally held to mean “Lisp has macros”, more or less – a Lisp program's source code is a syntax tree made of Lisp lists, that your Lisp program can introspect into and transform into new syntax trees and execute. Your program is literally a data structure.
My Forth code has very few things I would refer to as “data structures”. There is no significant language for defining them – I write one-off words that do pointer arithmetic. I only have a handful, so I haven't felt the need to generalize. It does zero transformation of them – they have been carefully chosen to be directly useful for everything the program needs them for, in-place, as-is.
Instead, the common pattern is that everything is code, which, thanks to Forth's flexible non-syntax, can be made to look a lot like data. Often data is compiled directly into the code that uses it – instead of naming a constant that's passed into a function to do a particular thing, you name a function that takes no arguments that just does the thing. (There are lots of flexible ways to make this sort of thing easy and inexpensive in Forth.) Forth is hyper-imperative to a degree that, as modern programmers, we've largely forgotten is even possible. Even, say, the number 4 is arguably a word executed for its side effects (push the value 4 onto the current stack). Of course, this is how CPUs work, too – you don't have a concept of “4” on its own in assembly, you have the concept of moving “4” into a register, or into memory. The only thing you can tell a CPU is to do things. Forth is the same.
One consequence is that a Forth word that represents a constant is invoked in exactly the same way as a word that makes decisions. What this means is that it is virtually impossible to write yourself into a corner by “hard-coding” something. You can start with the most direct implementation, and expand it into something more flexible as you need to. I often find myself turning a word that was very static into something dynamic, and not having to change any of the code that depends on it. And my Forth has developed lots of facilities for sophisticated decision-making and dispatch. It turns out that most sophisticated decision-making is largely just indirection, and is easy to accomplish even in extremely resource-constrained environments. Many things I used to think of as modern, expensive conveniences – anonymous functions! polymorphism! green threads! – are actually extremely cheap and simple to build, they just... don't exist in C.
In “Programming a Problem-Oriented Language”, Chuck Moore defines “input” as “...information that controls a program.” Forth and Lisp share the idea that, most of the time, it's more powerful and flexible to use the language's parser to read a program's input. Before JSON, there was the s-expression, the universal data structure, and in Lisp, you usually are either using macros to turn that data into code directly, or writing an interpreter for that data. You can often think of a Lisp program as a collection of small, domain-specific virtual machines.
However, Forth doesn't really have a parser; it has a tokenizer, a symbol table, an interpreter, and a virtual machine. Parsing Forth and executing Forth are synonymous; hell, compiling Forth and executing Forth are synonymous. Forth says you don't need a domain-specific virtual machine; you already have a perfectly good machine right here! Why not just solve your problem directly, right now?
You may need sophisticated abstractions to succinctly describe the logic of how your problem is solved, and writing good Forth code is all about investing in those. But Forth makes an argument that most of the data that your program deals with is actually about controlling what your program should do, and making decisions about what your program should do is the job of code.
There are drawbacks to this approach, of course; plenty of things that are inconvenient to express as text, plenty of times I wished I had a “live” data structure I could update on the fly and persist while my program is running, rather than having to exit my program and update my code. But if you can work within the constraints, there is enormous flexibility in it. I'm writing a puzzle game, and while I have a terse vocabulary for defining levels, it's also trivial for me to add little custom setpieces to a given level, to throw in dialogue in response to weird events, to add weird constraints that only apply in that space, because at every step, I have the full power of the language at my disposal. If I'd taken a data-driven approach, I would have needed to plan everything in advance, to design my little problem-oriented VM and and hope I thought of everything I needed. But with a code-first approach, I can be much more exploratory – try to build things, and if they work well, factor them out to be used more generally. Architecture arises naturally from need, as I build.