BT
x Share your thoughts on trends and content!

A Post-Apocalyptic sun.misc.Unsafe World

Posted by Christoph Engelbert on Aug 30, 2015 |

Java as a language and the JVM as a platform just celebrated its 20th birthday. With its noble origins on set-top-boxes, mobiles and java-cards, as well as all kinds of server systems, Java is emerging as the lingua franca of the Internet of Things. Quite obviously Java is everywhere! 

Less obvious is that Java is also heavily immersed in all sorts of low-latency applications such as game servers and high frequency trading applications. This was only made possible thanks to a propitious deficiency in the Java visibility rules for classes and packages, offering access to a controversial little class called sun.misc.Unsafe. This class was and still is a divider; some love it, others hate it with a passion - the essential part is, it helped the JVM and the Java ecosystem to evolve to where it is today. The Unsafe class basically compromised on some of Java's hallmark strict safety standards in favor of speed.

Passionate discussions like on JCrete, or our "What to do About sun.misc.Unsafe" mission paper, and blog posts such as this one on DripStat, created awareness of what might happen in the Java world if sun.misc.Unsafe (along with some smaller private APIs) were to just disappear without a sufficient API replacement. The final proposal (JEP260) from Oracle now solves the problem by offering a nice migration path. But the question remains - how will this Java world look once the Unsafe dust has settled?

Organization

A glance at the sun.misc.Unsafe feature-set provides the unsettling realization that it was used as a one-stop dumping ground for all kinds of features.

An attempt to categorize these features produces the following five sets of use cases:

  • Atomic access to variables and array content, custom memory fences
  • Serialization support
  • Custom memory management / efficient memory layout
  • Interoperability with native code or other JVMs
  • Advanced Locking support

In our quest for a replacement for all of this functionality, we can at least declare victory on the last one; Java has had a powerful (and frankly very nice) official API for this for quite some time, java.util.concurrent.LockSupport.

Atomic Access

Atomic access is one of the heavily used features of sun.misc.Unsafe featuring basic “put” and “get” (with or without volatile semantics) as well as compare and swap (CAS) operations.

public long update() {
 for(;;) {
   long version = this.version;
   long newVersion = version + 1;
   if (UNSAFE.compareAndSwapLong(this, VERSION_OFFSET, version, newVersion)) {
      return newVersion;
   }
  }
}

But wait, doesn’t Java offer support for this through some official APIs? Absolutely, through the Atomic classes, and yes it is as ugly as the sun.misc.Unsafe based API and actually worse for other reasons, let’s see why.

AtomicX classes are actually real objects. Imagine for example that we are maintaining a record inside a storage system and we want to keep track of certain statistics or metadata like version counters:

public class Record {
 private final AtomicLong version = new AtomicLong(0);

 public long update() {
   return version.incrementAndGet();
 }
}

While the code is fairly readable, it is polluting our heap with two different objects per data record instead of one, namely the Atomic instance as well as our actual record itself. The problem is not only the extraneous garbage generation, but also the extra memory footprint and additional dereferences of the Atomic instances.

But hey, we can do better - there is another API, the java.util.concurrent.atomic.AtomicXFieldUpdater classes.

AtomixXFieldUpdaters are a memory optimized version of the normal Atomic classes, trading memory footprint for API simplicity. Using this component a single instance can support multiple instances of a class, in our case Records, and can update volatile fields.

public class Record {
 private static final AtomicLongFieldUpdater<Record> VERSION =
      AtomicLongFieldUpdater.newUpdater(Record.class, "version");

 private volatile long version = 0;

 public long update() {
   return VERSION.incrementAndGet(this);
 }
}

This approach has the advantage of producing more efficient code in terms of object creation. Also, the updater is a static final field, and only a single updater is necessary for any number of records, and most importantly it is available today. Additionally It is a supported public API, which should almost always be your preferred strategy. On the other hand, looking at the creation and usage of the updater, it is still rather ugly, not very readable and frankly counter-intuitive.

But can we do better? Yes, Variable Handles (or affectionately - “VarHandles”) is on the drawing board and offers a more attractive API.

VarHandles are an abstraction over data-behavior. They provide volatile-like access, not only over fields but also on elements inside of arrays or buffers.

It might seem odd at first glance looking at the following example, so let’s see what is going on.

public class Record {
 private static final VarHandle VERSION;

 static {
   try {
     VERSION = MethodHandles.lookup().findFieldVarHandle
        (Record.class, "version", long.class);
   } catch (Exception e) {
      throw new Error(e);
   }
 }

 private volatile long version = 0;

 public long update() {
   return (long) VERSION.addAndGet(this, 1);
 }
}

VarHandles are created by using the MethodHandles API, a direct entry point into the JVM internal linkage behavior. We use a MethodHandles-Lookup, passing in the containing class, field name and field type, or we “unreflect” a java.lang.reflect.Field instance.

So why, you might ask, is this better than the AtomicXFieldUpdater API? As mentioned before, VarHandles are a general abstraction over all types of variables, arrays or even ByteBuffers. That said, you just have one abstraction over all of these different types. That sounds super nice in theory, but it is still somewhat wanting in the current prototypes. The explicit cast of the returned value is necessary since the compiler is not yet able to figure it out automatically. In addition there are some more oddities as a result of the young prototyping state of the implementation. I hope those problems will disappear in the future as more people get involved with VarHandles, and as some of the related language enhancements proposed in Project Valhalla start to materialize.

Serialization

Another important use case nowadays is serialization. Whether you are designing a distributed system, or you want to store serialized elements into a database, or you want to go off-heap, Java objects somehow need to be serialized and deserialized quickly. “The faster the better” is the motto. Therefore a lot of serialization frameworks use Unsafe::allocateInstance, which instantiates objects while preventing constructors from being called, which is useful in deserialization. This saves a lot of time and is still safe since the previous object-state is recreated through the deserialization process.

public String deserializeString() throws Exception {
 char[] chars = readCharsFromStream();
 String allocated = (String) UNSAFE.allocateInstance(String.class);
 UNSAFE.putObjectVolatile(allocated, VALUE_OFFSET, chars);
 return allocated;
}

Please note that this code fragment might still break in Java 9, even though sun.misc.Unsafe will remain available, because there’s an effort to optimize the memory footprint of a String. This will remove the char[] value in Java 9 and replace it with a byte[]. Please refer to the draft JEP on improving memory efficiency in Strings for more details.

Back to the topic: There is not yet a replacement proposal for Unsafe::allocateInstance but the jdk9-dev mailing list is discussing certain solutions. One idea is to move the private class sun.reflect.ReflectionFactory::newConstructorForSerialization into a supported place that will prevent core classes from being instantiated in an unsafe manner. Another interesting proposal is frozen arrays, which might also help serialization frameworks in the future.

It might look like the following snippet, which is totally my concoction as there is no proposal yet, but it is based on the currently available sun.reflect.ReflectionFactory API.

public String deserializeString() throws Exception {
 char[] chars = readCharsFromStream().freeze();
 ReflectionFactory reflectionFactory = 
       ReflectionFactory.getReflectionFactory();
 Constructor<String> constructor = reflectionFactory
       .newConstructorForSerialization(String.class, char[].class);
 return constructor.newInstance(chars);
}

This would call a special deserialization constructor that accepts a frozen char[]. The default constructor of String creates a duplicate of the passed char[] to prohibit external mutation This special deserialization constructor could prevent copying the given char[], since it is a frozen array. More on frozen arrays later. Again, remember this is just my artificial rendition and will probably look different in the real draft.

Memory Management

Possibly the most important usages for sun.misc.Unsafe is for reading and writing; not only to the heap, as seen in the first section, but especially writing to regions outside of the normal Java heap. In this idiom native memory is acquired (represented through an address / pointer) and offsets are calculated manually. For example:

public long memory() {
 long address = UNSAFE.allocateMemory(8);
 UNSAFE.putLong(address, Long.MAX_VALUE);
 return UNSAFE.getLong(address);
}

Some might jump in and say that the same is possible using direct ByteBuffers:

public long memory() {
 ByteBuffer byteBuffer = ByteBuffer.allocateDirect(8);
 byteBuffer.putLong(0, Long.MAX_VALUE);
 return byteBuffer.getLong(0);
}

On the surface this approach might seem more appealing; unfortunately ByteBuffer’s are limited to roughly 2-GB of data since a DirectByteBuffer can only be created with an int (ByteBuffer::allocateDirect(int)). Additionally all indexes on the ByteBuffer API are only 32-bit as well. Was it Bill Gates who once asked “Who will ever need more than 32 bits?”

Retrofitting the API to use long-type will break compatibility, so VarHandles to the rescue.

public long memory() {
 ByteBuffer byteBuffer = ByteBuffer.allocateDirect(8);
 VarHandle bufferView = 
           MethodHandles.byteBufferViewVarHandle(long[].class, true);
 bufferView.set(byteBuffer, 0, Long.MAX_VALUE);
 return bufferView.get(byteBuffer, 0);
}

Is the VarHandle API in this case really better? At the moment we are constrained by the same limitations; we can only create ByteBuffers with ~2-GB, and the internal VarHandle implementation for the views over ByteBuffers is also based on ints, but that might be “fixable”. So at present there is no real solution to this problem. The nice thing here though is that the API is again the same VarHandle API as in the first example.

Some more options are under discussion. Oracle engineer and project owner of JEP 193: Variable Handles Paul Sandoz talked about a concept of a Memory Region on twitter; and although the concept is still nebulous, the approach looks promising. A clean API might look like something like the following snippet.

public long memory() {
 MemoryRegion region = MemoryRegion
      .allocateNative("myname", MemoryRegion.UNALIGNED, Long.MAX_VALUE);

 VarHandle regionView = 
             MethodHandles.memoryRegionViewVarHandle(long[].class, true);
 regionView.set(region, 0, Long.MAX_VALUE);
 return regionView.get(region, 0);
}

This is only an idea, and hopefully Project Panama, the native code OpenJDK project, will present a proposal for those abstractions in the near future. Project Panama is actually the right place for this, since those memory regions will also need to work with native libraries that expect a memory address (pointer) passed into its calls.

Interoperability

The last topic is interoperability. This is not limited to efficient transfer of data between different JVMs (perhaps via shared memory, which could also be a type of a memory region, and which would avoid slow socket communication). It also covers communication and information-exchange with native code.

Project Panama hoisted the sails to supersede JNI in a more Java-like and efficient way. People following JRuby might know Charles Nutter for his efforts on JNR, the Java Native Runtime, and especially the JNR-FFI implementation. FFI means Foreign Function Interface and is a typical term for people working with other languages like Ruby, Python, etc.

The FFI basically builds an abstraction layer to call C (and depending on the implementation C++) directly from the current language without the need of creating glue code as in Java.

As an example, let’s say we want to get a pid via Java. All of the following C code is currently required:

extern c {
  JNIEXPORT int JNICALL 
       Java_ProcessIdentifier_getProcessId(JNIEnv *, jobject);
}

JNIEXPORT int JNICALL 
       Java_ProcessIdentifier_getProcessId(JNIEnv *env, jobject thisObj) {
 return getpid();
}

public class ProcessIdentifier {
 static {
   System.loadLibrary("processidentifier");
 }

 public native void talk();
}

Using JNR we could simplify this to a pure Java interface which would be bound to the native call by the JNR implementation.

interface LibC {
  void getpid();
}

public int call() {
 LibC c = LibraryLoader.create(LibC.class).load("c");
 return c.getpid();
}

JNR internally spins the binding codes and injects those into the JVM. Since Charles Nutter is one of the main developers of JNR and also works on Project Panama we might expect something quite similar to come up.

From looking at the OpenJDK mailing list, it feels like we will soon see another incarnation of MethodHandles that binds to native code. A possible binding might look like the following snippet:

public void call() {
 MethodHandle handle = MethodHandles
               .findNative(null, "getpid", MethodType.methodType(int.class));
 return (int) handle.invokeExact();
}

This may look strange if you haven’t seen MethodHandles before, but it is obviously more concise and expressive when compared to the JNI version. The great thing here is that, just like the reflective Method instances, MethodHandle can be (and generally should be) cached, to be called over and over again. You can also get a direct inlining of the native call into the jitted Java code.

However I still slightly prefer the JNR interface version as it is cleaner from a design perspective. On the other hand I’m pretty sure we will get direct interface binding as a nice language abstraction over the MethodHandle API - if not from the specification, then from some benevolent open-source committer.

What else?

A few more things are floating around Project Valhalla and Project Panama. Some of those are not directly related to sun.misc.Unsafe but are still worth mentioning.

ValueTypes

Probably the hottest topic in these discussions is ValueTypes. These are lightweight wrappers that behave like Java primitives. As the name suggests, the JVM is able to treat them like simple values, and can do special optimizations that are not possible on normal objects. You can think of those as user-definable primitive types.

value class Point {
 final int x;
 final int y;
}

// Create a Point instance
Point point = makeValue(1, 2);

This also is still a draft API and it is unlikely that we would get a new "value" keyword, as it might break user code that might already use that keyword as an identifier.

Ok but what really is so nice about ValueTypes? As already explained the JVM can treat those types as primitive values, that, for example, offer the option to flatten the layout into an array:

int[] values = new int[2];
int x = values[0];
int y = values[1];

They might also be passed around in CPU registers and most probably wouldn’t need to be allocated on the heap. This actually would save a lot of pointer dereferences and will offer the CPU a much better option to prefetch data and do logical branch prediction.

A similar technique is already used today to analyze data in a huge array. Cliff Click’s h2o architecture does exactly that, to offer extremely fast map-reduce operations over uniform, primitive data.

In addition ValueTypes can have constructors, methods and generics. You can think of it, as Oracle Java language architect Brian Goetz so eloquently declares, as “Codes like a class, behaves like an int”.

Another related feature is the anticipated “specialized generics”, or more broadly “type specialization”. The idea is simple; extend the generics system to support not only objects and ValueTypes but also primitives. Using this approach the ubiquitous String class would be a candidate for a rewrite using ValueTypes.

Specialized Generics

To bring this to life (and keep it backwards compatible) the generics system would need to be retrofitted, and some new, special wildcards will bring the salvation.

class Box<any T> {
  void set(T element) { … };
  T get() { ... };
}

public void generics() {
 Box<int> intBox = new Box<>();
 intBox.set(1);
 int intValue = intBox.get();

 Box<String> stringBox = new Box<>();
 stringBox.set("hello");
 String stringValue = stringBox.get();

 Box<RandomClass> box = new Box<>();
 box.set(new RandomClass());
 RandomClass value = box.get();
}

In this example the designed Box interface features the new wildcard in contrast to the known . It is the description for the JVM internal type specializer to accept any type, whether an object, a wrapper, a value type or a primitive.

An amazing talk about type specialization is available from this year's JVM Language Summit (JVMLS) by Brian Goetz himself.

Arrays 2.0

The proposal for Arrays 2.0 has been around for quite some time, as visible in John Rose’s talk from the JVMLS 2012. One of the most prominent features will be the disappearance of the 32-bit index limitation of the current arrays. Currently an array in Java cannot be greater than Integer.MAX_VALUE. The new arrays are expected to accept a 64-bit index.

Another nice feature is the ability to “freeze” arrays (as we saw in the Serialization examples above), allowing you to create immutable arrays that can be passed around without any danger of having their contents mutated.

And since great things come in pairs we can expect Arrays 2.0 to support specialized generics!

ClassDynamic

One more interesting proposal floating around is the so called ClassDynamic proposal. This proposal is probably in the earliest state of any of the ones we have mentioned so far, and so not a lot of information is currently available. But let’s try to anticipate what it will look like.

A dynamic class brings the same generalization concept as specialized generics but on a broader scope. It provides some kind of templating mechanism to typical coding patterns. Imagine the returned collection from Collections::synchronizedMap as a pattern where every method call is simply a synchronized version of the original call:

R methodName(ARGS) {
  synchronized (this) {
    underlying.methodName(ARGS);
  }
}

Using dynamic classes as well as pattern-templates supplied to the specializer will simplify the implementation of recurring patterns dramatically. As said earlier, there is not a lot more information available at the time of this writing, but I hope to see more coming up in the near future, most probably as part of Project Valhalla.

Conclusion

Overall I’m happy with the direction and accelerated speed of development of the JVM and Java as a language. A lot of interesting and necessary solutions are underway and Java is converging to a modern state, while the JVM is providing new efficiencies and improvements.

From my perspective people are definitely advised to invest in the genius piece of technology that we call the JVM, and I expect that all JVM languages will benefit from the newly integrated features.

In general I highly recommend the JVMLS talks from 2015 for more information on most of these topics, and I suggest you read a summary of Brian Goetz’s talk about Project Valhalla.

About the Author

Christoph Engelbert is Technical Evangelist at Hazelcast. He is a passionate Java developer with a deep commitment for Open Source software. He mostly is interested in Performance Optimizations and understanding the internals of the JVM and the Garbage Collector. He loves to bring software to its limits by looking into profilers and finding problems inside of the codebase.

Rate this Article

Relevance
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Excellently written article. by Ben Cotton

Interesting that the upcoming Java 9 VarHandles API still leaves us without a 1 x direct allocation invoke > 2gb. Seems like UNSAFE remains absolutely essential for now.

So in Java 9, should I want to directly alloc a 1TB native buffer, will I even have the choice to call

UNSAFE.allocateMemory(1TB);

or will I have to call (500 times!) the VarHandles equivalent of

ByteBuffer.allocateDirect(2GB);

??

Project Panama looks nice.

Thanks Christoph.

Nice post by Binh Nguyen

Thanks, nice information. Nice tips.

Nice post by Binh Nguyen

Thanks, nice information. Nice tips.

Re: Excellently written article. by Christoph Engelbert

Yeah it looks like that. So far there is no real proposal for the most common "off-heap" use case. However we're kind of going there :)

Re: Nice post by Christoph Engelbert

@Binh: Thank you :) It was actually a lot of fun writing it.

What are VarHandles? ;-) by Heinz Kabutz

Yeah, I knew what they were, but under the other name of "enhanced volatiles" :-)

Great article, and also a very interesting discussion we had with Marcus at JCrete about this. I'm glad it was recorded for posterity.

A few comments if you don't mind, to your great article:

Atomic Access

AtomicLongFieldUpdater check on every access whether the context in which they are used allow read/write to the field.
This is because an AtomicLongFieldUpdater could accidentally be leaked from inside the class and was done as a safety me
asure. Thus the AtomicLongFieldUpdater has more overhead than Unsafe or VarHandles.

Note also that there are several new atomic access methods in Unsafe for longs and ints which are significantly faster t
han the default CAS loop. How do VarHandles address that?


Serialization

The deserializeString will also break in Java 6, see www.javaspecialists.eu/archive/Issue230.html


Memory

31-bit indexes, not 32-bit. And Bill Gates famously was questioning whether anyone would ever need more than 640k :-)

Heinz

Re: What are VarHandles? ;-) by Christoph Engelbert

Atomic Access: yeah great addition, thanks! :)
Serialization: Oh, good to know. Hopefully we can ditch Java 6 support soon ;)
Memory: I know that this was not the real question. The original text had a smiley to make it more clear (seem to have disappeared while copy-editing). But I think even the original question was never real but a very common myth (www.computerworld.com/article/2534312/operating...)

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

7 Discuss
General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.