Reimplementing TeX's Algorithms: Looking Back at Thirty Years of Programming
Glenn Vanderburg, director of engineering at LivingSocial, gave an interesting recount of his effort to implement TeX’s algorithms in Clojure at the last ClojureConj conference. In the process, he discovered how much programming has changed in the last thirty years.
Short history of TeX
First, some history may help explain TeX importance. Donald Knuth released TeX 1.0 in 1982, and 32 years later it still represents the state of the art for computerized typesetting, says Glenn. Furthermore, TeX has long been one of the few examples of large programs being available in source code form to learn from.
TeX is a masterwork: it is fast, portable, produces excellent results, and it is still in wide use thirty years later, with astonishingly few bugs having been found.
Anecdotally, Glenn remarks, Knuth decided to start building TeX when he received the first page proofs of his magnus opus The Art of Computer Programming, which he found to be “disappointingly ugly”. So, Knuth started writing a program to make his work looks acceptable to him. When TeX became available, Tony Hoare, of quicksort fame, encouraged Knuth to publish its source code so it could be used by students since, at that time, it was 1982, the internet was not yet much of a thing, and there were not many examples of source code. This objective gave Knuth the impulse to start working on literate programming and TeX source code was finally published in 1986. It remained the “most widely read large program in the world” until the Linux kernel came up, says Glenn.
TeX architecture is a pipeline that processes text, splitting it into several kinds of objects, such as pages, paragraphs, lines, words, etc., and finally produces a DVI file. It’s now over thirty years since TeX came out and looking back on it it’s amazing to see how “primitive” things still were, says Glenn.
TeX source code is full of examples of what would not be considered good programming style today, such as:
- global variables;
- one-character variables;
- goto statements;
- procedures spanning hundreds of lines;
- lots of macros;
- duplicate code;
- local variable reuse;
- single-threaded assumption is “everywhere”;
- mutability is just “pervasive”
Reading the code is like visiting another era […] By the time the book was published in 1986, it represented a sterling example a way of programming that was already in many respects obsolete.
All of this was mostly a consequence of the limitations of the hardware that was available at the time, with limited computing power and memory availability, and, according to Glenn, Knuth put a lot of effort into reducing to a minimum function calls, which would have been too expensive. This made of TeX a highly integrated code base so “tangled up [that] it’s impossible to extract any part of it to use in isolation.”
TeX is aggressively hand optimized using techniques that we would really sneer at today but the ability to turn our noses up at those techniques are a gifts from thirty years of Moore’s Law & and not just Moore’s law but language implementation techniques and things like that.
Reimplementing TeX in Clojure: Cló
Thus, TeX may be not today the best example to point a new programmer to, still, in Glenn’s view trying to reimplement it can show how much programming has changed and also provide real examples of how algorithms change when they are translated from procedural to functional style.
According to Glenn, it is very hard to figure out what TeX code is doing, mostly due to its terseness and extreme optimization, as outlined above. Initially, he tried to keep his own design as close as possible to TeX’s. As mentioned, TeX is strictly single-threaded, and one of the goals that could be set in today’s computing world is making the best use of multi-core hardware that is readily available. A few Clojure abstractions allowed him to implement the basics of TeX’s pipeline as a sequence of functions, then he could switch from a sequential to a parallel execution model by just replacing the threading macro. “That allowed me to start doing like apples to apples comparison.” Cló implementation was of course far slower than TeX, but switching to parallel execution provided a “substantial benefit.”
Another interesting point that Glenn learned about was that somehow he felt compelled to implement the same kind of optimizations that were implemented in TeX. He soon realized, though, that this was preventing the use of nice abstractions that were naturally arising in the functional paradigm, thus making things more complex that they ought to be. And this also led him to wonder how much of TeX API was influenced by the language paradigm that it was written into, and specifically the pervasive mutability and single-thread assumptions.
The most important reflection, for Glenn, comes from realizing how much things have evolved in programming. If we look back at how programming looked like back in 1982, we would see that:
- computers were extremely slow, memory tiny;
- most programmers had never seen a multiprocessor, CPU had different word sizes and byte sizes;
- there was no IEEE standard for floating point arithmetics;
- portability meant supporting almost 40 different OSes, each of them with different file system structures, different path syntax, different I/O and allocation APIs, character sets;
- it was not possible to dynamically load code;
- optimizing compilers are still research projects;
- there is not open source, free software, and you had to implement from scratch incredibly basic, common data structures and routines;
- source control was just rudimentary, when available at all;
- automated testing was unheard of;
- today tooling is a luxury made possible by decades of small advances.
So, for what is left to improve upon in programming, Glenn’s invitation is to enjoy the ground that has been covered since then.