BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

LLVM Has Documented the PDB Format, Complete with PDB to YAML Conversion

| by Jonathan Allen Follow 43 Followers on Aug 21, 2017. Estimated reading time: 2 minutes |

In order to take advantage of the rich tooling available on the Windows platform, compiler writers such as LLVM need to be able to generate PDB files. PDB or Program Debug Database is a literal database describing compiled code on the Windows platform. Containing various types of records, it allows tools such as debuggers to map between compiled code and source code.

In order to improve performance, this data is heavily indexed. And that’s part of the problem. Zach Turner of LLVM’s Windows team writes:

CodeView is a debug information format invented by Microsoft in the mid 1980s. For various reasons, other debuggers developed an independent format called DWARF, which eventually became standardized and is now widely supported by many compilers and programming languages.  CodeView, like DWARF, defines a set of records that describe mappings between source lines and code addresses, as well as types and symbols that your program uses.  The debugger then uses this information to let you set breakpoints by function name, display the value of a variable, etc.  But CodeView is only somewhat documented, with the most recent official documentation being at least 20 years old.  While some records still have the format documented above, others have evolved, and entirely new records have been introduced that are not documented anywhere.

[…]

[PDB] contains CodeView but it also contains many other things that allow indexing of the CodeView records in various ways.  This allows for fast lookups of types and symbols by name or address, the philosophical equivalent of “tables” for individual input files, and various other things that are mostly invisible to you as a user but largely responsible for making the debugging experience on Windows so great.  But there’s a problem: While CodeView is at least kind-of documented, PDB is completely undocumented.  And it’s highly non-trivial.

Microsoft provides tooling and SDKs for consuming PDB files, but nothing for generating them. And even these require the use of proprietary libraries as the open source PDB code doesn’t even compile.

Based on that partial code upload from Microsoft, the LLVM team was able to build their own PDB generator. While still considered “alpha quality”, it allows applications compiled using CLANG and the LLVM backend to start working with Windows tooling. Turner continues:

We’d love for you to try it out and report issues on our bug tracker.  To get you started, download the latest snapshot of clang for Windows

As part of their exploration into supporting PDB, LLVM has documented the PDB format. While not complete, it offers an important looking into the complicated format that was previously unavailable.

To supplement this, they have also built a tool called llvm-pdbutil. Among other things, this allows for two-way conversion between YAML and PDB. (For those of you who don’t know, YAML is a human-readable format that uses whitespace instead of brackets. It is probably best known as the format used for the API documentation language RAML.)

It should be noted that there are actually two PDB formats. In addition to the full version discussed above, there is also a Portable PDB format intended just for .NET Core applications. Portable PDB is documented with an open source library for reading it.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT