Microsoft Unveils its Compiler as a Service
Early reports suggested that the Rosyln project would just be a better runtime-accessible compiler and REPL-style interpreter, but it turns out that it is much more ambitious. By opening up the entire compiler pipeline Microsoft hopes that developers will create a wide variety of tools at many levels.
There are four “API Layers” in the Rosyln project. Microsoft summarizes them as:
The compiler layer contains the object models that correspond with information exposed at each phase of the compiler pipeline, both syntactic and semantic. The compiler layer also contains a representation of a single invocation of a compiler, including assembly references, compiler options, and source code files. There are two distinct, but similar in shape, APIs that represent the C# language and the Visual Basic language.
The scripting layer represents a runtime execution context for C# or Visual Basic snippets of code. It contains a scripting engine that allows evaluation of expressions and statements as top-level constructs in programs.
The workspace layer is the starting point for doing code analysis and refactoring over entire solutions. It assists you in organizing all the information about the projects in a solution into single object model, offering you direct access to the compiler layer object models without needing to parse files, configure options or manage project to project dependencies.
The services layer contains all the Visual Studio IDE features, such as IntelliSense, refactorings, and code formatting features. It also contains the Services APIs, which allow a user to easily extend Visual Studio.
Of the four layers, only the Services APIs have a hard dependency on Visual Studio components. The rest can be used in any sort of application, though the Workspace APIs work better when hosted.
Most developers working with code, either for analysis or for rewriting, will start with the Workspace level. A workspace can either be provided by a host (such as an IDE) or created manually by loading a solution file. If provided by a host, events will be triggered to inform the developer when items within the solution change.
Starting with ISolution, everything below the Workspace level is represented as an immutable snapshot. This allows thread-safe access to every project, document, syntax, and symbol tree contained by the solution. Changes are made by making copies of the syntax tree, replacing portions as one goes along. As they are immutable, unchanged branches can be safely reused.
At the very bottom if the tree is the text representing the source code itself. The first pass of the compiler turns this into a syntax tree. A syntax tree can be created from an entire file or just a loose statement or expression. An interesting feature of syntax trees in Rosyln is that they have full fidelity with the original source code including every comment and bit of whitespace. This means any syntax tree can be turned back into source code, an important feature for code generators and refactoring tools.
The syntax tree consists of syntax nodes, tokens, and trivia. A syntax node always contains a combination of other nodes, tokens, and trivia. Examples include NamespaceDeclarationSyntax, ForStatementSyntax, and BinaryExpressionSyntax. Tokens are individual keywords, symbols, and identifiers. Trivia includes whitespace and comments, the bits of information not needed by the compiler but important for recreating the original source code representation.
Modifications are made to the syntax tree using a combination of constructors and the ReplaceNode method. This method removes the need to manually copy the unchanged portions of the syntax tree.
Syntax trees only represent the lexical and syntactic structure of the source code. To see its semantic meaning one has to create a compilation. A compilation is created from one or more syntax trees, a collection of references, and any compiler flags. The compilation’s primary role is to hold the list of symbols such as namespaces, types, methods, fields, events, local variables, and labels.
Developers will generally work with semantic models. A semantic model is created by feeding a syntax tree back into the compilation so that it will be annotated with symbol data. The semantic model can then be queried for information such as:
- The symbols referenced at a specific location in source.
- The resultant type of any expression.
- All diagnostics, which are errors and warnings.
- How variables flow in and out of regions of source.
- The answers to more speculative questions.
On top of the semantic model is the “Control and Data Flow Analysis APIs”. This allows one to quickly get information such as which variables are assigned in a region, which are referenced, and if it contains any jump or return statements.
The Rosyln CPT is available for download. It requires Visual Studio 2010 SP 1.
Keith Adams Dec 06, 2013