BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Graal: How to Use the New JVM JIT Compiler in Real Life

Graal: How to Use the New JVM JIT Compiler in Real Life

Bookmarks
49:08

Summary

Chris Thalinger discusses how to use Graal with JDK 10, how to compile an upstream Graal version, and what to look out for when using it for benchmarking or even in production.

Bio

Chris Thalinger is a software engineer working on Java Virtual Machines for more than 14 years. His main expertise is in compiler technology with JIT in particular. Initially being involved with the CACAO and GNU Classpath projects, the focus shifted to OpenJDK as soon as Sun Microsystems made the JDK open-source. Ever since Chris has worked on the HotSpot JVM at Sun, Oracle and now at Twitter.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

[Note: please be advised that this transcript contains strong language]

Transcript

Thalinger: I work for this tiny little company you might know. This talk is not really about Twitter, it's more about Graal, the compiler. The most important question of the whole conference is, "Who has a Twitter account?" That's pretty good, it's almost everybody. If you don't have an account yet, create one right now because I want you to tweet about my talk and all the other talks that you find interesting.

If you're going to tweet about my talk, please at that hashtag "twittervmteam" because you might not believe it, but Twitter actually has a VM team and I'm on the VM team. Two of my colleagues are also speaking at QCon Sao Paulo on Wednesday, I think, you can hear from them a little bit about machine learning. We do some machine learning thing and then Flavio is talking about all the Scala optimizations we are doing in Graal.

What Is Graal?

This talk is not about GraalVM. GraalVM is a very unfortunate marketing term, in my opinion, and I bitch about it on Twitter quite a bit because it's very confusing. GraalVM is an umbrella term that basically consists of three different technologies. One is called Graal, the JIT compiler, the other one is called Truffle, it's a framework where you compete with language runtimes. Then there is something called Substrate VM, which you might know as Native Image. That's where everyone is super excited about right now and freaking out. I'm only talking about Graal, the compiler today.

A JIT compiler is when you compile, let's say Java Scala or Kotlin to Java bytecode to class files. What the JVM does is it takes it and then it interprets it, which is pretty slow. Then there is something called a just-in-time compiler which compiles the Java bytecode into native code while you are running your application. That's why Java actually is as fast as it is, because of that. Just-in-time compiler, that's what I'm working on for a couple of years now. I’m going to talk about that later a little bit.

Graal is a just-in-time compiler for HotSpot. It's actively developed by Oracle Labs. There's an OpenJDK project called Graal. Most of the work is actually done on GitHub, if you're interested, I'm going to show you how to build Graal actually from GitHub and use it with the latest JDK11. Graal uses something called JVMCI. It's a compiler interface we introduced with JEP 243 and JDK 9 so you can actually plug in an external compiler. You'll see that later as well. Graal is written in Java. If there's one thing you should take away from this talk is this, it's written in Java and that's very important at least today. In the future, this will change.

HotSpot has two JIT compilers, one is called C1 or client compiler, the other one is called C2 or server compiler. C1 is a high throughput compiler. It does not do as much optimizations as let's say C2 and Graal. The purpose of C1 is to produce native code quickly so we get away from interpreting code so that it would run on native code. C2 is a highly optimizing compiler, it takes profiling information and does all the fancy optimizations that you can think of, bunch of inlining, escape analysis, loop optimizations and so on. Graal is supposed to be a replacement for a C2. That's a highly optimizing compiler and it's written in Java.

C1 and C2 are written in C++ while Graal is written in Java. There are two major differences between something that's written in C++ and something that's written in Java. It's how memory allocation works, it's malloc memory versus you allocate something on a Java heap. Then, at least today, we have to do something called a bootstrap because our compiler that's part of the JVM is written in a language that the JVM executes so it compiles itself while it starts up. You'll see that later as well.

Where Do I Get It

Where do you get it? Where do you get Graal? We actually used Graal in JEP 295 for something called ahead-of-time compilation. This is not native image, this is something else. This we did in 9 and it's basically a small command-line utility that takes Java class files or JAR files, sends off all the methods to Graal to compile it and it spits out a shared library at the other end. Then HotSpot can pick up that shared library so you can skip the interpretation of bytecode. If you have a very big application, this might help start up. I say might because it's very difficult to get it right. The difference between this one and Native Image is this one is actually Java while Native Image is a subset of Java, it's not really Java. I'm not going there because it would take too long.

Then Oracle on the JEP 317 added it as an experimental JIT composer. It put Graal, it was already in there, but it said, "Now, we basically announce it and say, it's an external JIT compiler, you can use it." That happened in 10. If you are running on a JDK 10 or later - 9 works as well because when we introduced the out-of-time feature, I made it so that it works, no one knew about it but I did.

Get It Demo

I used to do this demo differently. I used to do this talk in a cloud container where I started up an empty cloud container so I could show you that I'm not cheating. Everything I do today, you just have to do what I do, I haven't prepared anything so that also means a lot can go wrong, which sometimes it does. We'll see how that goes today. There's a lot of typing going on today as well, I don't know if you've ever typed in front of a lot of people, so typos - certainly possible.

I'm using the DaCapo benchmarks to show you some benchmarking numbers and then some other things that I need a little bit more to actually execute the code. Do I need something else? No, I don't think so. JDK 11 - I'm using 11 because it's LTS. I could use 12 as well, but every one of you is probably using JDK 11 in production right now, that's why I'm using it.

We set this guy and then that should be enough for this setup. It's 11.0.2, the one we just extracted, we do the same thing over here because we are going to compare C2 and Graal later in benchmarks runs. That's why I'm doing this.

Then the other thing I'm doing is, I'm going to set an environment variable that's called Java tool options and all the launches that the JDK has, we'll pick up an environment variable automatically. I'll set a bunch of things, so I don't have to set it all the time. First, as you might know, since JDK 9, the default GC is G1. Logging output of G1 is a little hard to read. Now, we're going to look at GC, I'll put a little bit later. Parallel GC is just so much easier to read, we use that one. We set a maximum heap size, pretty small, only 512 megs. I do this because I want you to see when GCs are happening. If I set the heap size too big then we don't see a GC and then I cannot show you what I want you to show.

We also set the start size of the heap to 512 megs. The reason why I'm doing this is, if we are running something with C2 and Graal, because Graal uses Java heap memory to do compilations, the heap expansion would be different and then we wouldn't really compare apples to apples. I'm trying to make it apples to apples as good as I can.

You also know that since JDK 9, we have modules now. As you can see, it picked up this environment variable and it's using it. I think they're 75 ish modules that the JDK has and we are looking for modules that are called jdk.internal.vm. As you can see, there are three, one is called jdk.internal.vm.ci, that's JVMCI. That's the compiler interface that we introduced in nine, it's basically an API and a module. There's obviously some part on the native code side as well because we have to talk to the VM. That's the Java interface. Then there's another one called jdk.internal.vm.compiler. It's just a Graal source code from GitHub in the Java module, that's really all this is. Then there's a little bit of management here, ignore that for now.

As I said, I was doing this talk with a cloud container and it always took a while for the cloud container to come up. In the meantime, I was talking about myself. We go through this quickly, I'm working on JVMs for a very long time, all these 14 years, basically on JIT compilers. That's all I do. I used to work at Sun Microsystems and Oracle on the HotSpot compiler team, mostly working with C2. It's a major pain in the ass, don't do it, use Graal. These are the three biggest projects I've done at Sun and Oracle. I've worked on JSR 92, which you might know as invokedynamic, method handles. If you use Java 8 lambdas, you're actually using invokedynamic under the hood without you knowing. I wrote a lot of that code. There is a package called java.lang.invoke, I wrote a lot of that Java code. If it doesn't work, you could technically blame me, but other people touched the code after me, so the code I wrote was perfectly fine and they broke it.

JEP 243 is the interface we already talked about. It's basically the interface that Graal was using. We just extracted it and made it a somewhat stable API. It's not an official API because it's not an official supportive namespace, but it's stable ish. JEP 295, we already talked about it, they now work for a very great company called Twitter. It's the best company on the planet.

Why This Talk

Why am I doing this talk and all the other talks that I have? I want you to try Graal. There are a bunch of reasons why I want you to try this because number one, I'm a very nice person. I want you to save some money. I think I have a slide for this that I'm doing where I explain basically how we are saving money by using Graal. We reduce CPU utilization for the stuff we do, we use less machines for everything we do and that's a lot of money.

Then I would like to fix existing bugs in Graal and we found a bunch and then there's one talk I actually talk about the bugs we found and I explained them. We haven't found the bugs in two years. What we need is we need to throw different code at Graal. What I would like to have is that you use your shitty production code and run it on Graal, that would be really nice. Since you all agreed earlier that you run on 11, that's not a problem.

Then I want to improve Graal, I want to make it a better compiler. We can only improve the compiler if we see issues. We cannot just optimize into the blue, we need to know, "This code doesn't work as it should or it doesn't work as well as on C2." Then we look into it and can actually improve it. That's why I'm doing this.

Then when I do my presentations, people come up to me and ask me, "Is it safe to use, because it's an experimental compiler? Does your data center burn down?" No, it does not. We have our own data centers, they are still up and running. You could tweet right now and you would see that it works.

How do I use Graal and where do I get it? This is exactly why I made this talk, for these two questions. Then when they actually try it, they usually send me an email or tweet at me or send me a DM. Most are complaining about benchmarked numbers and that Graal sucks. The reason for this is because they don't understand the difference between a compiler that's written in C++ and a compiler that's written Graal. They look at the numbers they get in the wrong way. I'm now explaining to you today so you don't make the same mistake.

That's the money-saving thing. It's called Twitter's quest for a wholly Graal runtime. It's basically the story of my first year working at Twitter, how we started running services on Graal, how much money we save. No, I'm not telling you how much it is because I'm not allowed to, but it's a lot. It's way more than they pay me, which I find unfair, but they don't agree, I don't understand. Watch this if you're interested.

Use It Demo

Back to the demo. We've already done this, let's move up. How do we use it? You get a JDK with Graal. If you have that module, jdk.internal.vm.compiler, then the only thing you have to do is to turn it on. Let's do a demo - how to turn it on.

We go to open JDK and then JEP 243. That's the JVMCI JEP that I was talking about and we hope that Wi-Fi works faster. These are the problems that I was talking about. I'm glad I'm not doing the demo in the cloud anymore because that wouldn't work. I wanted to show you in the text of the JEP it actually tells you how to turn Graal on or at JVMCI Compiler. It could be any compiler. If there is a compiler out there who's implementing the JVMCI API, you could run it.

There's one called "UnlockExperimentalVMOptions" because it's still an experimental VM feature. Then there is one called "EnableJVMCI." It's basically only turning on access to the API. It's not automatically turning on the JIT compiler, but it gives you access to the API. Sometimes if you run Truffle - I think Oracle Labs does this sometimes, they actually run on C1 and C2 but use Graal for Truffle. That's why you only turn on JVMCI but not the compiler. Then the last one is UseJVMCICompiler, that's really all you need. We copy this and we stick it into this Java tool options thing and then if we do a java-version, we'll see it picks it all up and then it prints this.

There is a thing called PrintFlagsFinal. It prints all the flags that the JVM has that's done. We are looking for the ones that have JVMCI in the name and there's like a 10, 12 or something. As you can see, we have here EnablJVMCI and it's true because we turned it on. Just use JVMCI compiler it's true because we turn it on. Then the one that I'm looking for is this one, JVMCI print properties and so we're going to do this and then it prints a very long list of properties. Most of them are Graal-related stuff, things you can tune with Graal, things you can change and at the very top, it's like a handful of JVMCI-related properties.

The one I'm looking for here is called InitTimer. Since it's a Java property, we decided to pass in options to JVMCI and Graal as Java properties because both are written in Java so it makes sense. We do a -D and then InitTimer and then equals true, obviously because that's how we have to do it. What it does is, it prints some initialization output when JVMCI initializes itself. Let's do this. Nothing happened. There is no additional log output except the version. Did we do anything wrong? No, we didn't. The way it works is JVMCI is lazily initialized.

What is tiered compilation? We start at interpreting Java bytecode and then we run interpreted, then we recompile with C1 and there are actually four tier levels in HotSpot and the first three are all tier levels that C1 compiles. The number depends on how much profiling information it gathers. We usually compile on tier three where we get a lot of profiling information which we then use later for C2 to recompile. You are going through tiers and every step of the way your code gets faster. This is tier compilation. C2 is tier four and we are replacing C2 with Graal here.

If we do print compilation here, you see all the methods that are getting compiled when you do a dash version. It runs a little bit of Java code, so it actually compiles stuff. The third column, this one here is the tier level of the compilation. As you can see, there is no four. The reason for this is because no code gets hot enough to actually trigger tier four compilation and that's why JVMCI is not initialized. What do we have to do is we have to run a little bit more.

We can do a dash L which basically only prints the benchmarks that the framework has. If we do this over here, within an init time on, you can see something is getting initialized. It starts here with the class HotSpot JVMCI runtime, then it does a bunch of things, get some configuration for our architecture, MD 64 but it doesn't look like it actually finished the initialization. That's correct because what happens is the list of benchmarks exit sooner before JVMCI is actually fully initialized and compose a method. We need to run a little bit more.

What we do is we run a small benchmark run of a benchmark called Aurora and then you can everything is being initialized. We actually finish. As you can see here, HotSpot JVMCI runtime, it took 56 milliseconds to do this. Then there is a thing called compiler configuration factory, which selects the compiler. As I said earlier, if you had more than one, that's compilers that support JVMCI, you can select it with a Java property. This is all described in the JEP that I can't show because the Wi-Fi is broken, but you can see that. Then by default, and since there's only one competitor, it always selects Graal. We initialize this class called HotSpot Graal runtime, does a bunch of things. As you can see, it creates a back end. Then it also looks like it's not really finishing the initialization of this. That's because the Aurora benchmark harness redirects the output into a file. It's somewhere now in a file, but believe me, it finishes. This benchmark around here was actually done with Graal.

What is Bootstrapping

Now, we have to talk about bootstrapping. Bootstrapping is still a problem. Oracle Labs combined with Oracle, the travel platform group who are doing Java. There is a project called LIP Graal, we don't have it yet. I think the latest GraalVM release actually has it in it but OpenJDK or OracleJDK does not. LIP Graal uses substrate VM native image to AOT compile Graal itself, which totally makes sense. Then the whole bootstrapping part goes away and also, the memory allocation stuff that we're seeing later will go away but at this point in time, we have to deal with bootstrapping.

Graal is just another Java application running in your JVM, it's written in Java, it loads Java classes, obviously. These Java classes have Java methods and these methods - Graal's own methods need to be compiled at some point. Otherwise, we would interpret our compiler and that would be ridiculously slow. We need to compile it, that's the bootstrap. Let's do that.

Bootstrap Demo

You can do an explicit bootstrap like this, Bootstrap JVMCI. Please don't do this. This is really just for purposes for doing a presentation. Sometimes it can be helpful when you do benchmarking, but don't do this because it skews everything. Let's do an upward strip here. As you can see, every dot is a 100 method compilations and it gets faster. The dots come up faster because Graal gets compiled by itself and then it can compile code faster. It makes sense. We compiled 2,500 methods in 19 seconds. No one wants to wait 18 seconds for this fucking Java thing to come up. I know that no one writes LS in Java, but if someone would write LS, 18 seconds - probably not.

The bootstrapping can be done explicitly, or it's done implicitly in the background. You know that every JVM has GC threads. You might know that, it has a bunch of threads where it does garbage collection in parallel. It does the same for compilations, it has compiler threads and the compiler threads, they do the work in the background. You're still interpreting your code, you're running your stuff, whatever it is, and in the background, you're compiling code. Once they have this done, you then run on the compiled code.

We run again the benchmark. We run three iterations of this Aurora benchmark over here with C2. It takes 2.9 seconds - let's say three seconds. If we do the same thing over here with Graal, you can see that the first run takes a little bit longer than with C2 - like five seconds. Then the second and the third run, it’s actually faster, which is surprising. The benchmark itself is a little flaky but it's the performance is about the same. We use the difference between the first run, the one-second difference is because we have to compile Graal. It's not 18 seconds, it's basically one.

We at Twitter, we run it that way. We do the bootstrap, we don't AOT compile Graal or anything because you're only compiling a limited amount of methods for Graal. It's not becoming more. If your application is bigger, it's still only a second. I'm sure all your applications take longer to start up than two seconds or one. If it's one second more, it really doesn't matter.

What We Learned

Bootstrapping compiles quite a lot of methods. If you run tiered, what we just did, it's about 2,500. If you turn tiered off, that means Graal needs to compile more because C1 is not in the mix. It actually compiles roughly 5,000, and 5,000 methods is what you will see when you run a big application. Then you compile about 5,000 Graal methods, but the overhead is really not noticeable.

Then you could do it either upfront - don't do it - or on-demand during runtime. That's what we just saw. By default, on-demand compiles Graal, the Graal methods themselves are only compiled with C1. The reason for this is, we don't want the compilations for the Graal methods to race with the methods for our application. For the Graal methods, we want code as quickly as we can so that it compiles our application. You can turn this off with a flag.

Java Heap Demo

Java heap usage is also very important. Graal is written in Java, that means as I said at the very beginning when we just learned that Graal methods are only compiled by C1 so that's not a problem, but all the methods that are being compiled for your application use Java heap memory. Possibly, Graal methods too. That's the flag where you can turn it on and off. Let's do a quick Java heap demo here.

We are using this benchmark again. We turn on. We're running the three iterations of Aurora and what the benchmark harness does is, it does a system GC before and after a benchmark iteration. The reason for this is it cleans up the heap. It gets rid of all the stuff. As you can see, during the benchmark, there are no GCs. Aurora is a very compute intense benchmark, it doesn't do a lot of memory allocations. We have roughly 39 megabytes on the heap after a run and then we collect down to 1. It doesn't allocate a lot.

If we run the same thing over here with Graal, you can already see that there are more stuff going on because we have GCs happening during our first iteration. We have this one here, we allocated about 130 megs and then we collect down to 7, another one 35 6-ish collect down to 8, and then there's another 86 megabytes on the heap after the run. That's Graal doing its work. That's Graal compiling the benchmark methods.

The second iteration, as you can see, there was no GC during the benchmark. We had about 130 megabytes after the run on the heap. That means we still were compiling some methods, but after the third iteration here, we're not compiling anymore. All the benchmark methods are compiled, that's it. That's the reason why only the first iteration is slower, because it does some work, it does GCs, it slows it down a little bit, but then you're done.

I'm arguing that when you start up whatever you have if you have a microservice or application or whatnot, it will take a while for this thing to come up. Almost all the compilations for your application depending on the size obviously, but almost all of them happen in the first 30 seconds, first minute, maybe first two minutes. Then rarely sprinkled in a few compilations but the majority is done at the very beginning. At that point in time, your application is not even using all the Java heap memory because it's not even fully up yet. If you live in a microservice world, it has to connect to a trillion of other microservices network connections, maybe it does a little bit of a warmup loop, and then at that point, everything is compiled and then you are ready to accept production requests.

The whole thing with that Graal is using up your Java heap memory is not really true. There's actually an advantage here and I was arguing with the Oracle people that LIP Graal takes this advantage away. In a cloud world, and we all live in a cloud world today, you're basically paying for memory. You know about metadata, there's some memory on the side you have to reserve. If you have an eight-gigabyte cloud container, you cannot make your Java heap eight gigs, you all know that, because there's some additional native memory on the site that JVM needs. One of that, and you never paid attention to this, is that JIT compilers need memory to compile stuff.

If you're not leaving out enough memory on the side, you could actually get your container killed because the compiler uses too much memory. It's the drastic one I've ever seen. This is not normal but the craziest compilation we've ever seen at Oracle was a C2 compilation that used one gigabyte of memory because it was huge, it did a lot of loop optimizations and that makes compiler graphs really big. That's, as I said, not normal. The normal size of compilations to memory allocation is like 10 megabytes or 20-30, big ones may be 100, but you need to reserve that memory on the site. If you're not using C2, you could technically take that memory and give it to the Java heap because then you have a little bit more Java heap and at the very beginning Graal would use it and then later you have more memory for your application. That's an advantage but it will go away, I think.

What did we learn? Graal uses Java heap memory to do the compilations. There is no heap isolation yet. That's the LIP Graal thing I was talking about. Most Graal memory usage is during startup. As I said, most compilations happening at the beginning when your application is not fully up yet. Remember the memory is used anyway. It's either malloc memory or on the Java heap, but you can't escape it. We need the memory to do the compilations.

Build Graal Demo

What I will do is, I will clone my local thing, we clone Graal. Then we need something called MX. It's a script that Oracle Labs is using for everything. It can clone the code, it can build it, it can run the test, it creates configuration files for your ID, it does everything. I don't even know how big it is now, it's over 10,000 lines of Python code. I want to throw up on stage. I complain so many times about this, but it's not going away. If you want to run Bleeding Edge, you need this.

I would need to put MX on my path because you have to run it. This is the only time where I'm cheating because I'm running out of time. I have MX already in my path. I'm going to unset Java tool options really quick so we're not getting all this output. Then the only thing you have to do is MX build. I set Java home, it's JDK 11. I think it even prints it that it's picking that up and now it's building it. I can't pull the latest version, but I pulled this one before I came to QCon in my hotel room. It's from today, it's the latest thing.

Let's wait for this, it takes a few minutes. Let me show you in the meantime, Xlog:class+loads, and the benchmark, and then the Aurora thing again, and we grab for GraalVM. As you can see, when we run this and we log all the class load and that's going on, we see all these Graal classes being loaded. They're being loaded from this thing. JRT is like the file system that the JDK uses internally to load files from modules. It loads the codes, the Graal code from this module, which we've seen before. jdk.internal.vm.compiler, that's the one that's being shipped with OpenJDK.

We are waiting for this compiler to finish over here, it's compiling a bunch of stuff. You can also see that it's compiling Truffle. We only want a thing that's called Archiving Graal. Then GraalVM Graal compiler. MX VM basically just runs the VM with the Graal version we just built. We can do mx vm- version here. It should work because I think it's done compiling. It prints some stuff and then I can do a verbose, prints a bunch of more stuff. Ignore the stuff at the top, this is the important line that's using the JDK 11. Then as you can see, it throws in things here, upgrade module path to this guy which is basically a modular Truffle which was built. That's the graphing we just built here on the left.

Now, we are running with the latest version of Graal. If we do this guy with MX VM and Graal, the Graal class files, they are now being loaded from a file called Graal.jar. You can actually see, it's picking up the latest Graal version we just built and it runs the thing. Let me remove the logging here and run three iterations of that guy. Every time I do this on stage, I hope that it's actually faster than it was before, but it never is because the Aurora is not. There's not much we can do about it anymore. You see it's five seconds, the first one and then 5.2, 5.6. That's where we had earlier as well.

That's how easy you can get Graal, build it, and then play around with it. You can use MX to create the ID configuration, load it up in your favorite ID - I hope it's Eclipse - then you can play around. Do whatever. It's so easy because it's Java. You change something, you save, and then you just run with it. You don't even have to recompile your JDK. It's amazing.

Scala Demo

That was the demo. I cannot do this production demo because A, I'm running out of time and B, it doesn't build, so I'll ignore that. I want to do a Scala demo though.

For this one we're going back here and then we are setting the Java tool option thing again, but we are increasing our heap size to two gigabytes. The reason for this is, otherwise, we have too many GCs. Let's do a benchmark run of default size and we do two iterations, that should be enough, of a benchmark called factorie. Let me run this over here with C2, then we do the same here and we wait.

Scala bench. This is the benchmark file I've been using all the time. There’s a page with all the benchmarks it has and there's this one called factorie, that's the one we are running right now. It says, "That's a toolkit for deployable probabilistic modeling to extract topics using Latent Dirichlet Allocation. I have no fucking clue what it is. It's a very good use case to show you what I want to show you. I gave this presentation one time and after my talk, a guy came up to me and said, "We use LDA at my company." I said, "What?" This is LDA, the Latent thingy, they are actually using it. I don't know why, really doesn't matter. I forgot something very important. We have to lock GC, otherwise, we don't see what the hell is going on.

You can see it's allocating roughly 600. It grows a bit in size, but 650, 680 megabytes per GC and we do 2 iterations because the first one builds up some internal data structures and then the second one is a little more closer to the truth. It took 20 seconds to run that benchmark, let me start this over here while I talk xlog:gc. It took 20 seconds to do this benchmark iteration and we had roughly - you can see 37 and 64, so not quite 30 GC cycles. Oh, damn it. The heap size is too small.

Let's wait for this guy. As you can see, the first iteration with C2 over here took 22 or 23 seconds and it did this many GCs. You can already see the first iteration over here was much quicker and did less GCs. Let's wait for the second iteration. This one took 20 seconds to run and this one took 16 and we did 32 or 53 - only 20 GCs. I'm not actually sure what is happening, it usually cuts GCs in half. The reason for this is the way Graal compiles code. It has something called escape analysis. That's an optimization, I'm not going to explain it now, it has a better escape analysis implementation than C2. Escape analysis can get rid of temporary object allocations. This benchmark is written in Scala and Scala allocates all these temporary objects all the time. Graal, the compiler can very well optimize that code. That's also why we at Twitter, save so much money because pretty much all of our microservices except a handful, are written in Scala.

We don't do as much GCs that reduces GC cycles, user CPU time, we reduce CPU utilization with that. That's what I wanted to show you. This benchmark is not really representative. It's the best one that's out there but I picked it because I want to look cool on stage. I'm not picking one that's 1% better.

Summary

The summary of this talk it's the summary of all of my talks and I know I'm over time, but it's very simple. That's all, I want you to try it. This is all you have to do if you have JDK 10 or later, which I know is an issue because we at Twitter, we also run on eight. You can download an 8 version from Oracle which is GraalVM basically. If you download GraalVM, the community edition, it's what I showed you today because the community edition has the Graal version that's on GitHub as open source. If you're on 8, download that one, please try it. Please let us know how it works. If it works better for you, I'm very happy for you. Tweet it out. If it crashes, that would be amazing because I want you to find a bug. I want you to find the bug and not us, because if Twitter is down because of a compiler bug, my life is in hell. I'm here right now. Give it a shot, let us know how it works. If you run something and it's faster, excellent, it's great. If it's slower, also let us know. What I said earlier, we would like to figure out what it is. Maybe we can do something about it.

 

See more presentations with transcripts

Recorded at:

Aug 22, 2019

BT