Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations The Trouble with Memory

The Trouble with Memory



Kirk Pepperdine talks about the steps to take to cure the problem of memory and also covers how the JVM can both help reduce the memory - strength of an application.


Kirk Pepperdine has been working in high performance and distributed computing for nearly 20 years. His focus has primarily been on performance, working on architecting, developing, and tuning applications running on Cray and other high performance computing platforms. He now specializes in Java, where he works in all aspects of performance and tuning in each phase of a project life cycle.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


I wanted to talk about memory, the trouble with memory. This is a talk I probably should have done years ago, but, I don't know, I just didn't think it was that interesting. And maybe it isn't. You can tell me at the end. So if you don't like the talk, I'll help you with selecting the button to push. And if you do like the talk, the green one is helpful.

She's covered all kinds of things. So I'm not going to bother with this too much, except for we have a startup, jClarity. We make performance diagnostic tooling and tooling to help people find out what's going on in their application. And I showed some video of JSON. It's an unconference that was actually modeled after JCrete, which is another unconference that we co-founded with Dr. Heinz Kabutz. And as the name suggests, we actually have that on the island of Crete every summer. It's really nice. So we say it's the hottest conference on the planet. For our definition of hot. And some other stuff, right? So let's start by asking questions. Because every good presentation has to ask questions. What are your performance trouble spots? Just shout them out. What's the thing that you believe gives you the most performance grief, when you're dealing with throwing a system out into production?

Participant 1: Immature developers.

Pepperdine: Immature developers. That's a whole other talk. We had that last night, didn't we? No, we did. Anybody else have any ideas, technical of nature?

Participant 2: GC.

Pepperdine: Sorry, GC. Servicing GC. GC is cool. What's that? Browser? What? Browser rendering? Yes, don't get me started. Yes, our client container is wonderful, isn't it? Who loves it? Nobody put up their hands. What is your performance trouble spot? Nobody said database interactions. Where are we? All working for Oracle now or something?

Participant 3: Network latency.

Pepperdine: Network latency. Yes, sure. Let's try this actually. Since nobody put up their hands, I'm assuming your hands aren't working. So let's see, you guys over there, put your hands up. Everybody over here, put your hands up. And I think you guys can keep your hands down. And everybody in the back, put your hands up. I don't see all hands up. Both hands up. It's not working. It's good stretch, right? There you go. After lunch, stretch. You guys are all suffering from memory problems. You guys probably aren't.

And this is what we're saying. We're saying about 70% of all the applications I run into are bottlenecked and some sort of memory issue. And this is why I said, "Okay, probably time to write a talk about this”, because not only are people bottlenecked on this issue, they don't actually realize it. They don't even see it. It's not even visible to them. So part of this talk is to tell you how you're bottlenecked on it, and why and what's happened. And the other part is to give you some help in one recognizing and the other one is trying to fix it.

And no, garbage collection is very often not at fault. Now, we can do a lot with garbage collection tuning to hide the memory problems, but that's pretty much all we're doing, is we're hiding it. If you really want to solve the problem, you really need to go to the core of the problem. So, the question is, do you use Spring Boot? Cassandra? Or just to be fair, any other NoSQL solution? Apache Spark? Any hands? Yes, we got it. Or any other big data solution? Log4j? Logback? I hate to ask this one. JDK logging? Nobody? You guys are all crazy. Actually, we use JDK logging. Or any Java logging frame - JSON? Who here uses any form of JSON? I should see the whole room hand go up. Unless your arms are broken or something. Or almost any marshalling protocol? ECom caching products? Any caching? Yes, you got it? Hibernate? So on?

If you use any of these, then you're very likely in the 70% of people that are suffering from memory issues. And the list just goes on and on and on of things that are actually just not really memory aware. And so, really, the question is, what are the problems that these things cause and what can we do about it


We have high memory churn rates. We have many temporary objects. We can have large live data sets; that's going to cause a whole other set of problems because you haven't inflated live data set sizes. And this can be caused by what we call loitering. So we just did this a couple weeks ago, somebody said, "Hey, we got this big problem with GC pauses," you look at it and go like, "Yes, we know how to reduce your GC pauses. First off, get rid of the memory leak.” And once you get rid of the memory leak, you have all these other loitering objects that are around. In this case, it was like Jetty sessions. So there had a large, kind of like 24-hour or 48 hours - these things were around for days, longer than they needed to be. And so we just turned the session timeout back and then all of a sudden their memory footprint went down. And all of a sudden their pause time started getting smaller, and going like, "Wow. So how does that work?"

You know, a few war stories. Reduced allocation rates from 1.8 gigabytes per second to zero. Yes, that's not a typo. It actually is zero bytes per second. TPS jumped from 400,000 to 25 million. So we're not talking about small gains here, we're talking about the potential to get exceptionally large gains if you pay attention to this problem. So I gave my usual logging rant at a workshop that I gave. I was in the Netherlands about a year or so ago. And that night, they went and stripped all the logging out of their transactional monitoring framework that they're using, which wasn't getting them the performance they wanted, which is why I was there giving the workshop in the first place. And when they stripped out all their logging, the throughput jumped by a factor of four. And they're going like, "Oh, so now our performance problem not only went away, we've been tuning this thing so long that actually, when they got rid of the real problem, it went much faster than they needed, like twice as fast as what they needed to go."

There's another product company we're working with, and the list of problems here, I mean, I could make this list a lot longer. We could just do the whole hour on how great a hero I am at fixing these problems. But that's not really the point here. The point is that there are some really significant gains can be made if you just pay attention to how you're utilizing memory.

Allocation Sites

Allocation sites. How do we form an allocation site? Well, here I get the Foo. Foo equals new Foo. Lots of Foos in there, I guess. And when you break this out into the byte code, then you can see you get this new thing. And you can see pretty much that's every time you do a new - this is what happens, right? And this is going to cause an allocation to occur. And there's a number of different ways that this allocation can actually occur, is we can go down what's known as a fast path, we can go down to a small path, or it's even possible that we have this wonderful technology inside the JVM called a JIT, and the JIT looks for optimizations. One of the optimizations that it can actually perform, is say, "I'm not really going to allocate this this way. I'm going to just basically unbox the values inside this object and place these values on the stack directly. I'm not going to do heap allocation at all." So, small objects may be optimized. And it might be possible that the optimization, or that call site, is going to be eliminated from the code just completely. It's one thing it can do.

So we're sort of talking about allocations. We can get a little bit of understanding about Java heap. Normally, when I talk about Java heap, I talk about it in the context of garbage collection. And in this time, I'm going to talk about it in the context of the allocators. Now, there's always this interplay between the garbage collector and the allocators. They sort of have to cooperate with each other in order to keep things rolling. And if they don't really cooperate, then you get full stop the world pauses where the garbage collector says, "You've just run out of memory.” So because you've just run out of memory, I'm just going to stop everything that's happening. I can take complete control of the heap and then I'm just going to clean things up.

That's like the “we don't have any cooperation world”. But that's not today's world. Today's world, we have a lot of what I would call cooperation happening. So we get very concurrent, or much more concurrent collectors, we just have these available to us like [inaudible 00:10:01], like GC is coming. These are a lot more concurrent. And because they're a lot more concurrent, the allocators and the collectors have to cooperate so that the collectors aren't interfering with the allocators so much. But that means that we need to give work to the allocators that the garbage collector would normally do, or give work to the allocators to help the garbage collector so that it can run concurrently.

Generally, what happens is we're going to take that long pause, we're going to make it shorter for when the times we do have to pause. But we're going to do that by putting work onto the allocation threads. And when we put the work onto the allocation threads, we're going to slow your allocations down a wee bit. Therefore, our application throughputs are just going to take a bit of a hit. No free lunch, pay me now, pay me later. The trick is to find a balance between how much work can we give the allocators, as opposed to how much work do we want to dump onto the garbage collector.

We have these numbers spaces. Here we have an Eden space, a nursery, that's where objects are generally allocated. We have a survivor space and we have a tenured space. Each of these different spaces contributes to problems in a different way. So when we're looking at tuning the garbage collector or tuning our application, we want to see the impact that they have on each of these spaces individually.


So let's talk about Eden allocation first. We generally have this top of heap pointer. And then what we're going to do is when we execute some code here, here's my very cleverly written code. And you can see that I got a couple of Foos and a bar and a byte array. And so we're going to allocate the Foo first, which means I'm going to do what's known as a bump and run. So I'm going to bump the pointer up, claim the memory, and then I can just dump the data in there. And then I’ve got to go through a few barriers to make it visible to everybody and life is good.

And then the bar does the same thing. And then, of course, I have a byte array. And you can see, I'm just going to bump the top of heap pointer up there. When I finally fill the space, then, of course, the collector is going to come along here and evacuate all the live data out. Now we're sort of developing a cost model here. So if you want to know why my pauses are long or what's taking so long, you have to sort of develop a cost model. The cost model here says, “Gosh, if I have a lot of live data, and when I fill a space and I copy it out, then, of course, that's going to take some time.” It takes time to allocate it, and it takes time to copy out the live stuff.

So we have a couple of different throttles that we can use here to control things. One of them is that we can control the allocation rates, that's in our code, or we can control the size of the buffer. That's the configuration of the JVM. So those are primarily the two throttles that we can use to help ourselves.

In a multi-threaded application, if we just use this very naive scheme, of course, we're going to have a really hot pointer because we have all of these threads that are trying to allocate and they're all going to try to bump this pointer up. So what we're going to do, is we're going to use a common technique when we have a hot lock - because this will create a hot lock because we have to put the barriers around the pointer- what we're going to do is we're going to do the equivalent of some striping. And in this case, what the stripping is, it's known as these thread local allocation blocks. Initially, each thread, when they start up, they're going to get a chunk of memory that they're going to allocate from heap and they're going to say, "I can allocate into there." And now, we're not competing for that hot lock anymore, we're just visiting it when we need to get a new TLAB.

And this, of course, creates some of its own issues. For instance, we could have like a couple of threads allocating things. So you can see we've allocated Foo and allocated bar, and we have also allocated this byte array. Now, the byte array is too big, it doesn't fit into a TLAB. So that has to go into the general allocation space. But everything else fits into a TLAB. And as you can imagine, the problems come when you get to here. Now what do we do? Well, I'm allowed to waste up to 1% of the TLAB, which basically means that once I pass that red line that I've drawn up there, which is not representative of 1%, in case you didn't notice. So not to scale. Is that what they say? We're allowed to waste up that, which basically means if we're past that threshold, they're saying, "Okay, now let's go get a new TLAB and start allocating into the new TLAB." So we get some memory wastage here.

But that's better than the alternative, which is basically saying, “I'm below the threshold. Let's try to allocate here. Oops, fail. Protect from buffer overrun, roll back, get a new TLAB, now do the allocation”. As you can imagine, that's a lot more expensive. If you have these situations occurring quite frequently, where you can't allocate because you're below the TLAB threshold, the TLAB waste percentage, but your thing is going to overflow the buffer, then that's a condition. If you can recognize it, you can adjust that waste percentage and get rid of these failed allocations. And that can make some difference in the performance of your allocators.

Tenured Space

Tenured space is different. And how tenured space works is different, and now with G1, than it was with generational collectors, I'm just going to talk about generational collectors here, because the G1 comes with its own set of issues that are completely different. And as you can see, what happens is that we're going to have this thing called a free list here. So, because we don't have another space to evacuate in, which is something that G1 solves, we can't take all the live data and copy it out. What we're going to do is we're going to maintain a free list. The free list is going to tell us where we can make the allocation from. Well, actually it's going to be the garbage collection threads that are going to be mostly doing the allocations here. You can see that we have to do all of this free list maintenance.

And there's some other work that needs to be done in order to help the garbage collector out. One of them is that we have to maintain card sets. So every time we have a pointer mutation, this is where mutation rates become important in the performance of your application. Every time we have, Foo that's pointing to bar and we say, "Oh, it's not pointing to the bar anymore, but it's actually pointing to Faa" or some other object, then if we have Foo and tenured and the other data in yarn, then I want to maintain that pointer information into separate data structure. So every time we do that swizzle with the pointer, then I have to update that other data structure in order to help the garbage collector out. That's an extra cost that you get on top of the allocation.

Data in tenured tends to be long lived. And the amount of data in tenured will affect the garbage collection pause times. I forgot to mention, that what we see is that when you do an allocation from the free list, it's about 10 times more expensive than actually doing an allocation in young generational space. So the question is, when would your thread decide to allocate in tenured space, as opposed to allocating in Eden space? Well, the answer is, if the allocation is deemed large enough, there's going to be a threshold for the different collectors and they're going to say, "I don't want to do that allocation there. I want to do it in tenured." As an example, the threshold for mostly concurrent [inaudible 00:18:13] collector is 50% of Eden. If you're larger than 50% of Eden, then that allocation will automatically occur in tenured space. And there's some other education conditions where these things can happen.

More Problems

What are the problems we run into? Well, we have high memory churn rates, many temporal objects. And what that's going to do is quickly fill Eden. That means it's going to give you frequent young GC cycles. And it also has this other side effect which is kind of a strange one. We have two different ways of aging data. We have time, like as in wall clock time. If we have a session time out of 30 minutes, then of course that data is going to stay in heap for 30 minutes. Now, the question is, how many garbage collection cycles is it going to face? I don't know. No idea. I do know that if it faces 15 of them, it's going to end up in tenured. Well, 6 or 15, depending on what the tenuring threshold is.

So that's the second way I have of aging the data. The second way of aging the data is how many garbage collection cycles has it actually faced? And if it's faced that many of them, then I'm going to have to copy that data off into tenured. Now, why do I care about this? Well, I know that if I can collect the data in Eden - actually, you know, what's the cost of collecting that data in Eden? Zero. Nothing. There is no cost. I'm only copying live stuff. I leave the dead stuff behind. So I never mark, never touch, never do anything with the dead stuff. We just move the live data out. So it's the live data that costs. By aging faster, because I frequent GCs, I'm actually end up copying a lot more data than I should. And I'm putting it into a space that's a lot more expensive to collect.

Some of you look confused. Any questions at this point? Yes, it's kind of strange. I call it premature promotion. That's promoting data in tenured sooner than it should be. And again, what are the knobs that we have to turn to control this? Well, the first one and the one that's probably going to have the biggest effect, is going to be allocation rate. We can make the buffer size bigger to accommodate the higher allocation rate, but still not going to get rid of the effects of having to allocate all of this memory. And it'll decrease the copy costs. So you get some benefits from doing that.

Allocation is quick, very quick in Java. Much quicker than in most run times. But quick times a large number is still a large number. That means it's just going to be slow. So we find that it doesn't really matter what platform you're on. We sort of get this benefit of lowering the allocation rate. So it's on this curve. I'd say that anytime you're above one gigabyte per second - remember, we tend to allocate the buffer 100 bytes at a time. If we did it a gig at a time, each individual location takes the same amount of time, but if we're doing it 100 at a time then, of course, you have a frequency issue. And the frequency issue seems to be really bad when you cross this one gig threshold approximately. And by the time you get it below 300 megabytes per second, then pretty much I'm just going to ignore that problem. In between it's sort of like, well, I may or may decide to do something about it, depending upon what other things that are going on.

Next problem is large live data sets, or inflated data sets sizes, as I mentioned, because of loitering. And really, in this case, we get inflated scan for roots time. That means that every time we do a garbage collection we have to find the root set, which means I have to go through all the data to figure out who's pointing into that young generational memory pool or Eden survivor. And that's just something that's linear with the amount of data that I have to scan through. If I have more data, it's just simply going to take longer. I get reduced page locality, again, inflated compaction times. We get increased copy costs. And it's very likely that you just have less space to copy to, which means that the JVM has to do more work to figure out how to do the compaction. And it has less room to work with so it may not do as good a job with the compaction.

And here's a nice little chart that we do here. And you notice it's the one on the left. The one on the left is actually - you just look at the red dots at the bottom. Forget all the other noise. I probably should have taken it out because it's just lots of noise. But if you look at the red dots at the bottom, that's the increased occupancy primarily of tenured space. And if you look at the red dots on the other side, do you see any correlation in the slopes of an imaginary line there? That's exactly what you're seeing. You're seeing the direct correlation of the additional copy costs but simply by having more data in heap. It's a nice chart.

If we have unstable live data set size, or what we normally say is a memory leak, then, of course, you're just going to eventually run out of memory. And that's going to be really bad for performance. So each app thread is going to throw out a memory error and will terminate because it can't set aside the allocation. And when all of the non-daemon threads are finished, then basically the JVM will shut down and it will throw the out of memory that we all love.


I'd like to talk about this, but I've decided I'm not going to right now. But I'll get back to it because it's quite fun. It's a fun bit of technology. Instead, let's look at some code here. [How much time do I have left? Where's my moderator? She doesn't know. It's 20 after.]

I got this goofy little application that I wrote, and it's really got lots of fun. There's lots of opportunities in here for criticizing the code and everything. Please send your comments along. I love to say [inaudible 00:25:12]. So what do we have here? I got it this application.

Participant 4: There's nothing on the screen.

Pepperdine: Of course, there's nothing on the screen. Why would there be anything on the screen? I didn't want you to see the code. Now you can see the code. Actually, you can't, yes. It's strategically positioned well. I have this thing set up that it's actually going to run, and we can see it's running and I'm making a guess here - you know, mastermind. Who's played mastermind? Does anyone not know what mastermind is? No colors, just numbers. What I do is I said, “here's the number sequence.” And then, I said “Imagine that all of the possible color sequences were there. Try to find that one.” What it does is it does some logic to search through and try to figure out what the correct answer is and then it will come back and ask me and say, "Okay, is this the correct answer?" And eventually, it'll come up with the correct guess.

So there's my score of 30. I said, "Out of 100,000 numbers, I'm going to choose three. Tell me which three I chose." It came back here and said, "It's 01 50,002." And more importantly, it took about 11 seconds to do that. Let me run it again. And when I'm running these really boring performance demos, that's when you can ask questions. It's gotten faster because it's going through some warm up here. And eventually, it'll just stabilize on some number. I think the answer is something like 8.6 seconds, if my demo runs correctly. And 9.6. Run the test again. Hopefully, the demo is on the correct configuration. It's fun. Oh, 8.2. Excellent. Now it seems to be efficiently warmed up and it should be around 8.3, 8.5 seconds or whatever. It doesn't really matter. The point is, we have a sort of magnitude area of where this will run.

Now, the question is, if we want to make this run faster, we have to figure out, "What's the problem?" I'm going to spare you the analysis, but we're going to say that allocation rates are the problem. And actually, I'll show you the last part of the analysis that will help you. So let's open up some tool. This is our GC analysis tool. I collected a GC log. That one right there. It's going to load it up and it's giving me all kinds of information here about what's going on. But really, what I'm looking for is allocation rates. There's my allocation rates. So you can see my allocation rates are basically wavering between seven gigs and three-and-a-half gigs or so. So that's above the one gig limit, would you say?

Participant 5: Just a little bit.

Pepperdine: Yes. So we can successfully say that if we were to do something with the allocation rate with this application, it should run faster. Everyone agree? The question is, where's the hot allocation site? How are we going to find that? Memory Profiler. I'll use VisualVM. As you can see, I was making sure it worked before here. I've done some crazy things in the past where the demo just didn't work for whatever reason. Now I test them. Don't up JDKs, all the standard stuff you shouldn't do. Let's attach the Memory Profiler here. This is just VisualVM. For those of you who have not seen it, you can get it open source GitHub. You can probably do the same thing with the Mission Control. Is Mission Control bundled with 11? I think they've debundled it. It's still bundled? It was bundled. Yes, as was VisualVM, and I think they've de-bundled it from 11. Anyway, it doesn't matter.

So we're looking for frequency events. I'm looking for allocated objects here. So let's go to our application here. Let's clear it out and run it again. And let's see if anybody can tell me what the hot allocated object is. Any guesses, anybody?

Participant 6: Undo PosRef.

Pepperdine: Right, undo PosRef. Good choice. Excellent. I love it. Let's go see where that is. Now I'm going to take a snapshot here. And we're going to open it up and we're going to say, "Somebody in here is just doing something, like, really nasty, creating all these objects." Now I can go back to source code. Now be brave, right? Let's shut all this stuff down. We don't need any of this stuff anymore. And board. So it's a score. Oh, look at that. Now, who wrote this code? Who would create a new object in a loop that's obviously being run millions and millions of time?

So what can we do? Well, let's reuse it. And of course, we need one here. So we'll just hoist that allocation out here. Now, how many people here believe this is going to make a difference? One, two, three, four, five, six, seven. For those of you who don't believe it's going to make a difference, I want to hear why. Actually, let's do it this way because I think some of you are probably going like, "I don't know."So, how many people here believe that this isn't going to make a difference? Anybody else? Remember, I set up the demo. And the answer is - let's run the test. It's not going to make a difference.

What's that? Yes, there you go. Let's explain this. Now this goes back to our escape analysis thing. I can run it again. After it warms up, it will probably go. What? You got to be kidding. Awesome. Go figure. That's the first time this one's ever come off. That's really good. I mean, literally, this demo is years old. It never come off. So we're [inaudible 00:32:49] about here. What's happening here? Well, really, what's happening is this technology called escape analysis is helping us out here. It's helping us avoid our stupidity. I should probably say the developer who put that stupid code in there has just been saved by hotspot.

What's a score? Let's go take a look. That's a Boolean and two ints. Rather than allocate this over and over again, let's just drop this on a stack and just get rid of the allocation site altogether. And you can look at it. Let's go back here and say, “Okay, it has local scope.” It doesn't go outside of the scope. I have complete control over everything that happens to this object here. There's no side effects or anywhere in the application. I can safely eliminate that allocation spot in this particular case. So why did the profiler complain?

Well, the profilers instrument the code. And when they instrument the code, what they do is they'll say, "Pass that object into this other object." Now it's outside of the scope of this method. It's escaped. So the profiler is lying to us. Classic case of lying. So we've been lied to, but we made the code, maybe. I don't know, that's not better. What's the real problem there? Let's see if we can find a real problem. I should have showed you the allocation rate. The allocation rates were identical, but I forgot to do that. I'm just going to add a zero here just for grins, just for fun. I'm going to bring up VisualVM again. What does that mean? I thought you were saying, "Time is up." Just before the last dramatic part of the demo. Drum roll. Here we go.

Profiler is - member settings. Everything looks cool. Of course, why wouldn't it? Just checking to make sure. Get rid of that. Frequency events. Wrong direction. Let's do that, back over here and run the test. Any thoughts this time? Yes, probably something to do with this big integer. Or this int array. We can take a look in the int array. When we look at the int array, there's some goofy code down here doing something. The point is, we can go into this code and we can make the changes. And when we make the changes, this is going to make the speed up that we need. In other words, this is going to run in under half a second once we fix this problem with the big integer and this array. And these are now big changes in the code. These are just two little small changes in the code but I'll leave that as an exercise for the imagination rate for the moment, because I'm not sure how much time we have. 15 minutes, I could have done it. I could have done it, but I'll leave it like this.

Instead of taking 10 seconds to run, just by making these changes by to the algorithm where we have this hot int array allocation, and the use of big picture when we actually don't really need it for this particular problem, we can reduce the allocation rates considerably. And as I mentioned before, this application will run in half a second now. That's how much of a difference that it will make. I probably should prove it to you, but I'd rather move on. It's going to take a while this way so I'll just exit. Let's go back to slides. I could write code, I guess. It's demo time.

Escape Analysis

You can see this escape analysis made a big difference and there's a lot of other optimizations in the JIT engine that can help you. But I would say that the first thing to do, is just look at the memory efficiency of your application and the garbage collector will tell you if you're allocating a lot, if you do the right calculation. Censum is our tool that we use to actually do our analysis. And one of the things that we'll regularly look at is the allocation rate, just to see, is this a hot allocating application? Do we have a large memory footprint? So those are the two things that we probably want to try to work to reduce in order to improve the performance of the applications.

Again, let the garbage collector tell you what's going on and then just move naturally from there to use the profilers. Be aware their hotspot is running under the hood. And so, the code that you've written is not necessarily the code that is actually going to be running. It can modify it and mutate it quite a bit. In that case, I would be happy to take questions. I prefer longer Q&A's to shorter ones.

Questions and Answers

Participant 7: [inaudible 00:38:39] confused-looking. You mentioned session sort of locked memory, the subject to 30 minute roll until the session goes away, and then all the pointers disappear.

Pepperdine: That's the session memory. Yes.

Participant 7: You mentioned that it would migrate from Eden to …

Pepperdine: Tenured. More than likely, depending upon the number of garbage collections that it reaches.

Participant 7: Yes. And it's a relevant question for me, what is it about tenured, remind me, that is more expensive to clean up?

Pepperdine: The cleaning tenured is more expensive because, one, you have to maintain a free list. Instead of a bump and run on a pointer, it's just basically pointer value, bump it up by and we're done. Barriers to drain caches and things like that. To do full free list maintenance, and possibly do compaction because of fragmentation, and then you have to do the pointer swiveling afterwards as you move things, you can see there's just a lot of very expensive operations there that you have to go through in order just to maintain the tenured space.

Participant 7: So other than allocating enough memory what other suggestions do you have [inaudible 00:39:57].

Pepperdine: Session data? One of the things to do is that if a session hasn't been in use for a while, it might offload it completely. And if somebody happens to come back, you can store it in a database or something like that. So serialize it off to off heap memory, if it's not going to be used for a while. If you're going to continuously be using it, then I would not say don't serialize it off, because the expense of serializing and de-serialization is going to offset any savings you might get by orders and magnitude, probably.

That's one thing you can do. I think the five-minute rule still applies. If you haven't touched anything in five minutes, it's not likely that you're going to. That's the old caching rule for cache eviction policies. And this is really funny. I mean, as another aside. I've had a couple of discussions now with companies that provide product caches for retail sites. And the first thing you look at and say, "Oh, look at your memories," and going like, "Okay, what are you doing? Where's the cache eviction policy? Let's check that." And it's like, "What do you mean, no cache eviction policy?"

So essentially, you have these companies selling you a memory leak. Bonus. And then you have arguments where the companies are going like, "Oh, but we're providing a useful product." And you go. "Yes, for whom? Amazon? So they can jock their memory bills up or …" I don't know, it was kind of weird. But, yes, just be aware. You know, memory eviction policy is something that we'll look at. And if you don't have one, probably a sign that you might be using the wrong product. Just saying.

Participant 8: The first optimization you did for not allocating and rather reusing an object, how does that sit with cleaner code and refactoring your code and also Java 8 in Lambdas?

Pepperdine: You're asking a performance guy about cleaner code? Seriously?

Participant 8: Yes. Or Lambdas.

Pepperdine: Lambdas? You want performance and you're using Lambdas? I mean, there's early questions about, you know, neighbors and things like that. To be honest, I try to write clean code. I'm not going to claim that this one is clean. It isn't. But first, I find it's much easier to optimize, and quite frankly, if it's well-written code, it's more than likely going to perform well anyways. And it's very likely you're not going to have the memory issues if you thought about writing the clean code just because the clean code is going to be less likely to allocate hot anyways. I'm all for clean code, and I'll be really happy when someone actually shows me a definition of clean code that we can measure, because that would be even better. Because my code is clean. I know it.

Participant 9: So for everyone who's trying to solve the problem of quick GCs or memory leaks, frequent GCs and then by just throwing more capacity at it, like you said, AWS, Azures of the world.

Pepperdine: Yes, that helps. Sometimes it's the answers, like, "Your memory leaks, you're going to run out of memory in three days." You know, double a memory, six days. Excellent. We're almost at the recycle time. Recycle the JVM.

Participant 9: Or restart your web app.

Pepperdine: Yes, you just recycle the VM. You're done. I mean, one of the garbage collectors coming up is Epsilon, which is basically no garbage collector in the VM, which means no interference with the allocators, which they can go like screaming fast. Really nice [inaudible 00:44:04] threads. But if you know your rate of allocation, if you take control over it, it's a technique we use. Just don't run the garbage collector during the day, at all. You make the heap big enough so that you don't do it.

And the Epsilon makes it even better. Now all you have to do is recycle. The only thing you lose is all your hotspot optimizations overnight, so you have to restart from cold in the morning. But, guess what? People are working at solving that problem also. So no garbage collector in a VM. There are some really nice use cases for it. But you need to have control of your allocation rates in order to be able to do that successfully without popping a lid, during production hours.

Participant 10: The people are still trying to solve the micro problem of [inaudible 00:45:02] management but not for people who are understanding …

Pepperdine: Yes, the micro, I completely get that. The people doing capacity planning don't understand the problem. And, yes, it's an opportunity for you to help them, I guess. Any other questions here?

Participant 11: Yes, I have one concerning the tenuring threshold. Because it seems that it's a good practice to set it to maximum. Would you agree?

Pepperdine: Most of the time, yes. Once you understand what your object life cycle looks like, you can possibly reset the tenuring threshold to a lower value that will minimize copy costs. But, generally, I just make it bigger and bigger. I mean, I generally have fight with support groups from larger corporations. Some of them are red in color. Because the way I configure heap is quite counterintuitive to what they're told to do on their screen flows when they - you know the questionnaire, right? “Do you have this? Do you have that? Then do this, flow of things”. Because they'll look at the configurations and say, "You shouldn't be doing that."

One of the things I often recommend is that young generations should be sometimes four or five or six times the size of tenured space so that you can make some very nice size survivor spaces and it really, really calms things down quite a bit. And using this technique, we've taken SLA violations down from double digits to like 1% or something like that. But it's not the recommended configuration. No one is going to recommend it. But if you follow the data, if you look at what Censum is telling you and you just do what it tells you, you're going to go there naturally. So just throw away your biases and follow the data. It's all I can say.

Participant 12: There's a funny story about CMS. When CMS, they were a performance team. When it first came out, they were performance tuning it. What was the threshold? Do you remember? What was the tenuring threshold? The recommendation was?

Pepperdine: Yes, I have a funny story because Heinz and I were the first - Dr. Heinz Kabutz, I mentioned that, my co-founder of JClarity - we were the first non-Sun speakers invited to speak at a Sun event. And somebody asked us a question about garbage collection. And I think Tony had written a document that basically said, "Make the survivor spaces really small to get rid of copy cost." And we looked at that and just followed the data and said, "No, that's not going to work." So some of you said, "How do you learn more about this?" I said, "Go to this white paper, read everything that's in there." It was beautifully written. Tony did a wonderful job writing it. I said, "Just do the exact opposite of what it said." So all of the Sun execs were like, "Oh, my God, did he just say that?" And Heinz was looking at it, it's like, "Oh, my God, did he just, say that?" It's like [inaudible 00:47:58]. But yes, really, that was the truth.

Participant 12: Yes, because garbage collection designers have thought that CMS is …

Pepperdine: They got the cost model wrong. Completely.

Participant 12: Yes. If you tenure everything, then CMS will take care of it. Because it's just free list has just bumped the pointer. But then, having fragmentation, they didn't account for all those things. So it's like the exact opposite for optimization purposes.

Pepperdine: Yes.

Participant 13: In the case of application, will holding a lot of data in the heap, say cache, and so …

Pepperdine: Yes. Don't run the garbage collector, ever.

Participant 13: Exactly. So would it make sense to load this data in memory and then, force GC so that those [inaudible 00:48:46] are promoted, while session [inaudible 00:48:48] are not promoted? Or is there any other technique that can be used?

Pepperdine: Generally, every time you outsmart the garbage collector, you've outsmarted yourself. So we generally don't recommend doing things like that. I don't know. If you have a large data set memory, generally, what we do is we just inventory it, try to figure out, "What really needs to be there? And what can we evict?" And quite often, we find that we can evict quite a bit of what's there just to reduce the footprint size. If you can't, then split it between VMs. Several smaller VMs often better than one big one. So there's some tricks you can do or, as I said, just try to configure things so that the garbage collector never really runs for however long you need it to not run for. And then trigger it at night when no one is using the system or very few people are using the system or some time when you can tolerate a multi-minute pause time.

Participant 14: I think you mentioned offhand, this thing with the generations is changing in Java 11 or something like that?

Pepperdine: Generations? No, I mean, G1 is different in that we have a regions so we can actually do evacuation in G1 that you can't do with generational collectors. And you would think that would seem to help, but again, it comes with its own set of issues in terms of large object allocations and tracking mechanism, maintenance. Remember, I said the card set is what we use for generational. So they use an "I have no clue how it works" system for tracking these pointers that they need to track with G1. So all of these things tend to be much more expensive. But, CMS has a practical heap size of about four gigs. You know, some people oversize to eight gigs or something like that. Once you start getting to like 16, 32 gigs, some size like that, then you're really beyond the practical scalability of that particular collector. So we need to go to something else, and that's G1 or, hopefully, GC or ZGC. I forgot which country I'm in.

Moderator: So to answer that question, yes.

Pepperdine: Yes, she's a garbage collection expert anyway. She knows more about it than I do. So, yes.

Moderator: Anyways, ZGC and [inaudible 00:51:34] are both single generation. So they're not …

Pepperdine: We call them time temporal generational.

Moderator: So just to answer that question, yes, with JDK 11 …


See more presentations with transcripts


Recorded at:

Feb 16, 2019