Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Microsoft Open-Sources GCToolkit to Tap into JVM GC Logs

Microsoft Open-Sources GCToolkit to Tap into JVM GC Logs

This item in japanese

Lire ce contenu en français

Microsoft’s footprint in the Java ecosystem has become more widespread. What started with Java on Azure and support for Minecraft, Microsoft continues with various tools targeted to make the Java Virtual Machine (JVM) more accessible and enhance the development experience for Java developers. The newest addition to this collection was the open-sourcing of the GCToolkit at the beginning of August 2021. As the name suggests, it is a set of libraries for analyzing Java garbage collection (GC) log files and parsing them into discrete events. It exposes an API for improved interaction with the toolkit and data aggregation. This allows the user to create arbitrary complex analyses of the state of managed memory of the JVM. It is the user’s entry point into the GCToolkit that hides the details of inner modules in a few method calls.

Besides the API, there are two other modules: the parser and the Vert.x module. The parser module is based on a collection of regular expressions and code written over many years to be considered the most robust GC log parser available. The Vert.x-based messaging backend makes use of two message buses: the first one streams data from a data source. The current implementation streams log lines from the GC log file. The consumers of this bus are the parsers that convert the data from the data source into events that represent either a GC cycle or safe point. These events are published on the second message bus: the event bus. Subscribers on the event bus are then able to be notified and to process events that are of interest to them.

The parser emits discrete JVM events (GC cycle events or safe point events) which makes it possible to write code to capture and analyze the data from those events. In order to facilitate the capturing and analysis of GC log files data, GCToolkit provides a simple aggregation framework. The kind of data users want to capture or the kind of analysis users want to perform is at the user’s discretion. For instance, to capture pause events for the purpose of analyzing heap occupancy, the aggregator captures the event, extracts the relevant data, and passes the data to the aggregation. This collates the data into meaningful analysis, for instance, total heap occupancy after garbage collection. The resulting data could be rendered as a graph, a table or some other more human friendly format.

More importantly, it has long been known that a suboptimal configuration collector will result in an application requiring more CPU and memory, while at the same time, degrading the end user’s experience. In other words, poorly tuned often equates to a more expensive runtime and unhappy users. The challenge is that to optimally tune the GC, one needs to create a delicate balance between several concerns all of which are not easily seen without the assistance of tooling. GCToolKit has been helpful in making this easier.

The GCToolkit was built around the parser module - a collection of regular expression and code that has evolved over the course of many years, becoming a robust GC log parser.

With Microsoft’s widening interest in the Java space, the focus on open source is also growing the benefits for the Java community. After major contributions made to the macOS M1 port and Windows on Arm port, Microsoft reemphasized its commitment to OpenJDK when it introduced its own OpenJDK build and joined the Eclipse Adoptium Working Group (previously known as AdoptOpenJDK).

By open sourcing the GCToolkit, Microsoft tries to provide a better way to visualize the JVM internal details on how it handles GC and memory allocation. With improved visibility comes improved tweaking, benefiting both the application’s end users but also the technical personnel mandated to handle it. The simple API and human friendly output mechanisms promise to improve the chore of reading GC logs by providing various mechanisms to analyse, extract and visualize data.

Rate this Article