Anders Hejlsberg Explains Modern Compiler Construction

The main reference in compiler construction, Compiler: Principles, Techniques, and Tools, also know as the Dragon Book, was first published in 1986. Anders Hejlsberg, known for his work on Turbo Pascal, Delphi, C# and TypeScript, explains in a Channel 9 interview how compiler construction today is different from how it was done 30 years ago.

The main characteristic of a classic compiler is the sequential processing of its input. The phases can be seen as a pipeline of its main components:

Lexer -> Parser -> Type Checker -> Code Generator -> Emitter

Over the last decade, expectations have increased progressively towards IDEs and tooling to provide features such as auto-completion, refactoring, code navigation, static analysis and so forth. User studies at Microsoft have shown these features must respond to input under a delay of 100 ms, otherwise they are deemed too slow. This contrasts with the compilation time of a whole project, which can take over a minute for a mid-sized solution.

To provide quick feedback in an IDE, a compiler must limit as much as possible the processing it does in real time. This means compiling the whole program at each keystroke is not an option. Instead, the compiler builds just enough information to provide an answer to the user.

Fast response time are achieved by limiting processing but also reusing old data structures as much as possible. Each time the user types in a character, every data structure in memory is considered erased. However, to increase responsiveness, everything that wasn’t modified is reused. Abstract syntax trees (AST), for example, can be reused if the source file they represent wasn’t modified.

Reuse is also possible even if a data structure is modified. Persistent data structures. which are immutable, handle modifications by creating and returning new instances while keeping the parts underneath that weren’t modified. For an AST, that would mean modifying the current node and its ancestor nodes up to the root. The rest of the tree, however, stays the same and is reused in the construction of the new instance.

Back several years ago, the need for IDE features provided in real-time led to a duplicated code base between the C# compiler and the IDE feature implementation. This was one of the main reason behind the creation of Roslyn. Roslyn was designed from the start to be usable from an IDE as well as a command line.

Anders and Seth discusses at the end the resources available to learn more about modern compiler construction. The Roslyn and TypeScript projects are given as examples, both being open source and publicly available on GitHub.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the .NET topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter