Apple has open-sourced GCGC, a tool for Java Garbage Collector (GC) logs visualization based on Python 3 and pandas.
GCGC uses a Jupyter notebook to analyze and visualize GC log files. A notebook is provided for the analysis and it generates plots and tables from collected GC information.
After the execution of the first cell in the notebook, the log files are read line-by-line and parsed according to a regular expression defined in a Python file. Then, a pandas dataframe is created to store the information.
The analysis of data is performed by grouping all matching values for a column into one place then filtering rows from the dataframe based on conditions. The remaining data is then processed and plotted. Using a notebook also provides the possibility to further filter, group and other manual manipulation of the data.
A list of 17 different functions and generated plots are provided with some of them being: stop-the-world pauses during program runtime, latency heatmap, heap before and after GC, heap allocation rate, pauses summary, pauses percentile, mean and sum of event durations and some others.
A few steps are required to start the analysis. First, it is a prerequisite to enable the GC logs in a JVM, adding the option shown below to the command line of the Java application:
-Xlog:gc*:./filename.log
After cloning the GCGC project and starting the Jupyter notebook, open the GCGC.ipynb
notebook in the jupyter web page.
In the first cell, it is required to configure the file path of the GC log files and the labels used as description. It is possible to use wildcards to read more files at once.
After pressing "run all" from the Cell menu, the GCGC notebook analysis will be executed.
These are some examples of generated diagrams:
It is also possible to download the notebook in different formats such as pdf, html, markdown and asciidoc.
Currently, there are supported garbage collectors from JDK11 and JDK 16 with some limitations for Shenandoah and ZGC collectors. In fact, Shenandoah uses two phases per garbage collection resulting in two plotted diagrams per GC phase; while ZGC uses the concept of safepoints, they have comparable metrics to pause times, but ZGC does not report them in the same fashion, therefore they have to be manually enabled.