BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Secrets of the Bytecode Ninjas

Secrets of the Bytecode Ninjas

Bookmarks

 

The Java language is defined by the Java Language Specification (JLS). The executable bytecode of the Java Virtual Machine, however, is defined by a separate standard, the Java Virtual Specification (usually referred to as the VMSpec).

JVM bytecode is produced by javac from Java source code files, and the bytecode is significantly different from the language. For example, some familiar high-level Java language features have been compiled away and don’t appear in the bytecode at all.

One of the most obvious examples of this would be Java’s loop keywords (for, while, etc), which are compiled away and replaced with bytecode branch instructions. This means that bytecode’s flow control inside a method consists only of if statements and jumps (for looping).

In this article, we will assume that the reader has a grounding in bytecode. If some background is required, see The Well-Grounded Java Developer (Evans & Verburg, Manning 2012) or this report from RebelLabs (signup required for PDF).

Let’s look at an example that often puzzles developers who are new to JVM bytecode. It uses the javap tool, which ships with the JDK or JRE, and which is effectively a Java bytecode disassembler. In our example, we will discuss a simple class that implements the Callable interface:

public class ExampleCallable implements Callable<Double> {
    public Double call() {
        return 3.1415;
    }
}

We can disassemble this as shown, using the simplest form of the javap tool:

$ javap kathik/java/bytecode_examples/ExampleCallable.class
Compiled from "ExampleCallable.java"
public class kathik.java.bytecode_examples.ExampleCallable 
       implements java.util.concurrent.Callable<java.lang.Double> {
  public kathik.java.bytecode_examples.ExampleCallable();
  public java.lang.Double call();
  public java.lang.Object call() throws java.lang.Exception;
}

This disassembly looks wrong - after all, we wrote one call method not two; even if we had tried to write it as such, javac would have complained that there are two methods with the same name and signature that differ only in return type, and so the code would not have compiled. Nevertheless, this class was generated from the real, valid Java source file shown above.

This clearly shows that Java’s familiar  ambiguous return type restriction is a Java language constraint, rather than a JVM bytecode requirement. If the thought of javac inserting code that you didn’t write into your class files is troubling, it shouldn’t be; we see it every day! One of the first lessons a Java programmer learns is that “if you don’t provide a constructor, the compiler adds a simple one for you”. In the output from javap you can even see the constructor that has been provided even though we didn’t write it.

These additional methods provide an example of the requirements of the language spec being stricter than the details of the VM spec. There are a number of "impossible" things that can be done if we write bytecode directly - legal bytecode that no Java compiler will ever emit.

For example, we can create classes with genuinely no constructor. The Java language spec requires that every class has at least one constructor, and javac will insert a simple void constructor automatically if we fail to provide one. If we write bytecode directly, however, we are free to omit one. Such a class could not be instantiated, even via reflection.

Our final example is one that almost works, but not quite. In bytecode, we can write a method that attempts to call a private method belonging to another class. This is valid bytecode, but it will fail to link correctly if any program attempts to load it. This is because the access control restrictions on the call will be detected by the classloader’s verifier and the illegal access will be rejected.

Introduction to ASM

If we want to create code that can implement some of these non-Java behaviours, then we will need to produce a class file from scratch. As the class file format is binary, it makes sense to use a library that enables us to manipulate an abstract data structure, then convert it to bytecode and stream it out to disc.

There are several such libraries to choose from, but in this article we will focus on ASM. This is a very common library that appears (in slightly modified form) in the Java 8 distribution as an internal API. For user code, we want to use the general open-source library instead of the JDK’s version, as we should not rely upon internal APIs.

The core focus of ASM is to provide an API that while somewhat arcane (and occasionally crufty), corresponds to the bytecode data structures in a fairly direct way.

The Java runtime is the result of design decisions made over a number of years, and the resulting accretion can clearly been seen in successive versions of the class file format.

ASM seeks to model the class file fairly closely - and so the basic API breaks down into a number of fairly simple sections for methods (although ones that model binary concerns).

The programmer who wishes to create a class file from scratch needs to understand the overall structure of a class file, and this does change over time. Fortunately, ASM handles the slight differences in class file format that are seen between Java versions, and the strong compatibility requirements of the Java platform also help.

In order, a class file contains:

  • Magic number (in the traditional Unix sense - Java’s magic number is the rather dated and sexist 0xCAFEBABE)
  • Version number of the class file format in use
  • Constant
  • Access control flags (e.g. is the class public, protected, package access, etc)
  • Type name of this class
  • Superclass of this class
  • Interfaces that this class implements
  • Fields that this class possesses (over and above those of superclasses)
  • Methods that this class possesses (over and above those of superclasses)
  • Attributes (Class-level annotations)

The main sections of a JVM class file can be recalled using the following mnemonic:

ASM offers two APIs, and the easiest to uses relies heavily upon the Visitor pattern. In a common formulation, ASM starts from a blank slate, with the ClassWriter (when getting used to working with ASM and direct bytecode manipulation, many developers find the CheckClassAdapter a useful starting point - this is a ClassVisitor that checks its methods in a similar manner to the verifier that appears in Java’s classloading subsystem.)

Let’s look at some simple class generation examples that follow a common pattern:

  • Spin up a ClassVisitor (in our cases, a ClassWriter)
  • Write the header
  • Generate methods and constructors as needed
  • Convert the ClassVisitor to a byte array and write it out

The Examples

public class Simple implements ClassGenerator {
 // Helpful constants
 private static final String GEN_CLASS_NAME = "GetterSetter";
 private static final String GEN_CLASS_STR = PKG_STR + GEN_CLASS_NAME;

 @Override
 public byte[] generateClass() {
   ClassWriter cw = new ClassWriter(0);
   CheckClassAdapter cv = new CheckClassAdapter(cw);
   // Visit the class header
   cv.visit(V1_7, ACC_PUBLIC, GEN_CLASS_STR, null, J_L_O, new String[0]);
   generateGetterSetter(cv);
   generateCtor(cv);
   cv.visitEnd();
   return cw.toByteArray();
 }

 private void generateGetterSetter(ClassVisitor cv) {
   // Create the private field myInt of type int. Effectively:
   // private int myInt;
   cv.visitField(ACC_PRIVATE, "myInt", "I", null, 1).visitEnd();

   // Create a public getter method
   // public int getMyInt();
   MethodVisitor getterVisitor = 
      cv.visitMethod(ACC_PUBLIC, "getMyInt", "()I", null, null);
   // Get ready to start writing out the bytecode for the method
   getterVisitor.visitCode();
   // Write ALOAD_0 bytecode (push the this reference onto stack)
   getterVisitor.visitVarInsn(ALOAD, 0);
   // Write the GETFIELD instruction, which uses the instance on
   // the stack (& consumes it) and puts the current value of the
   // field onto the top of the stack
   getterVisitor.visitFieldInsn(GETFIELD, GEN_CLASS_STR, "myInt", "I");
   // Write IRETURN instruction - this returns an int to caller.
   // To be valid bytecode, stack must have only one thing on it
   // (which must be an int) when the method returns
   getterVisitor.visitInsn(IRETURN);
   // Indicate the maximum stack depth and local variables this
   // method requires
   getterVisitor.visitMaxs(1, 1);
   // Mark that we've reached the end of writing out the method
   getterVisitor.visitEnd();

   // Create a setter
   // public void setMyInt(int i);
   MethodVisitor setterVisitor = 
       cv.visitMethod(ACC_PUBLIC, "setMyInt", "(I)V", null, null);
   setterVisitor.visitCode();
   // Load this onto the stack
   setterVisitor.visitVarInsn(ALOAD, 0);
   // Load the method parameter (which is an int) onto the stack
   setterVisitor.visitVarInsn(ILOAD, 1);
   // Write the PUTFIELD instruction, which takes the top two 
   // entries on the execution stack (the object instance and
   // the int that was passed as a parameter) and set the field 
   // myInt to be the value of the int on top of the stack. 
   // Consumes the top two entries from the stack
   setterVisitor.visitFieldInsn(PUTFIELD, GEN_CLASS_STR, "myInt", "I");
   setterVisitor.visitInsn(RETURN);
   setterVisitor.visitMaxs(2, 2);
   setterVisitor.visitEnd();
 }

 private void generateCtor(ClassVisitor cv) {
   // Constructor bodies are methods with special name <init>
   MethodVisitor mv = 
       cv.visitMethod(ACC_PUBLIC, INST_CTOR, VOID_SIG, null, null);
   mv.visitCode();
   mv.visitVarInsn(ALOAD, 0);
   // Invoke the superclass constructor (we are basically 
   // mimicing the behaviour of the default constructor 
   // inserted by javac)
   // Invoking the superclass constructor consumes the entry on the top
   // of the stack.
   mv.visitMethodInsn(INVOKESPECIAL, J_L_O, INST_CTOR, VOID_SIG);
   // The void return instruction
   mv.visitInsn(RETURN);
   mv.visitMaxs(2, 2);
   mv.visitEnd();
 }

 @Override
 public String getGenClassName() {
   return GEN_CLASS_NAME;
 }
}

This uses a simple interface with a single method to generate the bytes of the class, a helper method to return the name of the generated class and some useful constants:

interface ClassGenerator {
public byte[] generateClass();

public String getGenClassName();

// Helpful constants
public static final String PKG_STR = "kathik/java/bytecode_examples/";
public static final String INST_CTOR = "<init>";
public static final String CL_INST_CTOR = "<clinit>";
public static final String J_L_O = "java/lang/Object";
public static final String VOID_SIG = "()V";
}

To drive the classes we’ll generate, we use a harness, called Main. This provides a simple classloader and a reflective way of calling back onto the methods of the generated class. For simplicity, we also write out our generated classes into the Maven target directory into the right place to be picked up on the IDE’s classpath:

public class Main {
public static void main(String[] args) {
   Main m = new Main();
   ClassGenerator cg = new Simple();
   byte[] b = cg.generateClass();
   try {
     Files.write(Paths.get("target/classes/" + PKG_STR +
       cg.getGenClassName() + ".class"), b, StandardOpenOption.CREATE);
   } catch (IOException ex) {
     Logger.getLogger(Simple.class.getName()).log(Level.SEVERE, null, ex);
   }
   m.callReflexive(cg.getGenClassName(), "getMyInt");
}

The following class just provides a way to get access to the protected defineClass() method so we can convert a byte[] into a class object for reflective use

private static class SimpleClassLoader extends ClassLoader {
 public Class<?> simpleDefineClass(byte[] clazzBytes) {
   return defineClass(null, clazzBytes, 0, clazzBytes.length);
 }
}

private void callReflexive(String typeName, String methodName) {
 byte[] buffy = null;
 try {
   buffy = Files.readAllBytes(Paths.get("target/classes/" + PKG_STR +
     typeName + ".class"));
   if (buffy != null) {
     SimpleClassLoader myCl = new SimpleClassLoader();
     Class<?> newClz = myCl.simpleDefineClass(buffy);
     Object o = newClz.newInstance();
     Method m = newClz.getMethod(methodName, new Class[0]);
     if (o != null && m != null) {
       Object res = m.invoke(o, new Object[0]);
       System.out.println("Result: " + res);
     }
   }
 } catch (IOException | InstantiationException | IllegalAccessException | 
         NoSuchMethodException | SecurityException | 
         IllegalArgumentException | InvocationTargetException ex) {
   Logger.getLogger(Simple.class.getName()).log(Level.SEVERE, null, ex);
 }
}

This set up makes it easy for us, with minor modifications, to test out different class generators to explore different aspects of bytecode generation.

The non-constructor class is very similar. For example, here is a how to generate a class that has a single static field with getters and setters for it (this generator has no call to generateCtor()):

private void generateStaticGetterSetter(ClassVisitor cv) {
// Generate the static field
  cv.visitField(ACC_PRIVATE | ACC_STATIC, "myStaticInt", "I", null,
     1).visitEnd();

  MethodVisitor getterVisitor = cv.visitMethod(ACC_PUBLIC | ACC_STATIC, 
                                         "getMyInt", "()I", null, null);
  getterVisitor.visitCode();
  getterVisitor.visitFieldInsn(GETSTATIC, GEN_CLASS_STR, "myStaticInt", "I");

  getterVisitor.visitInsn(IRETURN);
  getterVisitor.visitMaxs(1, 1);
  getterVisitor.visitEnd();

  MethodVisitor setterVisitor = cv.visitMethod(ACC_PUBLIC | ACC_STATIC, "setMyInt", 
                                         "(I)V", null, null);
  setterVisitor.visitCode();
  setterVisitor.visitVarInsn(ILOAD, 0);
  setterVisitor.visitFieldInsn(PUTSTATIC, GEN_CLASS_STR, "myStaticInt", "I");

}

setterVisitor.visitInsn(RETURN);setterVisitor.visitMaxs(2,2);setterVisitor.visitEnd();

Note how the methods are generated with the ACC_STATIC flag set, and how the method arguments are first in the local variable list (as implied by the ILOAD 0 pattern - in an instance method, this would be ILOAD 1, as the "this" reference would be stored at the 0 offset in the local variable table).

Using javap, we can confirm that this class genuinely has no constructor:

$ javap -c kathik/java/bytecode_examples/StaticOnly.class 
public class kathik.StaticOnly {
public static int getMyInt(); Code:
0: getstatic    #11                // Field myStaticInt:I
3: ireturn

public static void setMyInt(int); Code:
0: iload_0
1: putstatic    #11                // Field myStaticInt:I
4: return
}

Working With Generated Classes

Up until now, we have worked reflexively with the classes we’ve generated via ASM. This helps to keep
the examples self-contained, but in many cases we want to use the generated code with regular Java files. This is easy enough to do. The examples helpfully place the generated classes into the Maven target
directory, so simply:

$ cd target/classes
$ jar cvf gen-asm.jar kathik/java/bytecode_examples/GetterSetter.class kathik/java/bytecode_examples/StaticOnly.class
$ mv gen-asm.jar ../../lib/gen-asm.jar

Now we have a JAR file that can be used as a dependency in some other code. For example, we can use our GetterSetter class:

import kathik.java.bytecode_examples.GetterSetter;
public class UseGenCodeExamples {
 public static void main(String[] args) {
   UseGenCodeExamples ugcx = new UseGenCodeExamples();
   ugcx.run();
 }

 private void run() {
   GetterSetter gs = new GetterSetter();
   gs.setMyInt(42);
   System.out.println(gs.getMyInt());
 }
}

This won’t compile in the IDE (as the GetterSetter class is not on the classpath). However, if we drop down to the command line and supply the appropriate dependency on the classpath, everything works fine:

$ cd ../../src/main/java/
$ javac -cp ../../../lib/gen-asm.jar kathik/java/bytecode_examples/withgen/UseGenCodeExamples.java
$ java -cp .:../../../lib/gen-asm.jar kathik.java.bytecode_examples.withgen.UseGenCodeExamples
42

Conclusion

In this article we’ve looked at the basics of generating class files from scratch, using the simple API from the ASM library. We’ve shown some of the differences between a Java language and a bytecode requirement, and that some of the rules of Java are actually just conventions from the language, and are not enforced by the runtime. We’ve also shown that a correctly written class file can be used directly from the language, just as though it had been produced by javac. This is the basis of Java’s interoperability with non-Java languages, such as Groovy or Scala.

There are a number of much more advanced techniques available, but this article should provide a good place to get started with deeper investigations of the JVM runtime and how it operates.

About the Author

Ben Evans is the CEO of jClarity, a Java/JVM performance analysis startup. In his spare time he is one of the leaders of the London Java Community and holds a seat on the Java Community Process Executive Committee. His previous projects include performance testing the Google IPO, financial trading systems, writing award-winning websites for some of the biggest films of the 90s, and others.

Rate this Article

Adoption
Style

BT