TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM Support to Java

The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. This release is likely to be of particular interest to teams developing LLM solutions on the JVM.

The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. It does not replace existing JVMs, but instead adds the capability of offloading Java code to the backends, handling memory management between Java and hardware accelerators, and running the compute-kernels. This capability provides a key component of modern cloud and ML workloads.

InfoQ has previously covered the project in 2020 and 2022.

TornadoVM compiles Java bytecode at runtime (by acting as a JIT compiler) to one of three backends: OpenCL C, NVIDIA CUDA PTX, and SPIR-V binary. Developers can choose which backends to install and run depending on their specific systems. TornadoVM is available either through standard Java dependencies or on SDKman.

Note that not every sort of Java computation is amenable to being offloaded to TornadoVM. For example, workloads with for-loops that do not have dependencies between iterations are very good candidates, as these allow computation in parallel.

In particular, matrix-based applications such as machine learning and deep learning are good candidates. Other good examples of this pattern are physics simulations (e.g., N-body particle computation), financial applications such as Black-Scholes, and a range of applications in computer vision, computational photography, natural language processing, and signal processing.

TornadoVM offers two complementary ways to express parallelism: the Loop Parallel API, which uses Java annotations such as @Parallel and @Reduce to parallelize loops, and the Kernel API, which uses a KernelContext for explicit GPU-style programming (with concepts such as thread IDs, local memory, barriers available), and which is similar to CUDA/OpenCL/SYCL.

The Loop Parallel API can be as simple as adding a type annotation:

public static void vectorMul(FloatArray a, FloatArray b, FloatArray result) {
    for (@Parallel int i = 0; i < result.getSize(); i++) {
        result.set(i, a.get(i) * b.get(i));
    }
}

Whereas the Kernel Context style explicitly builds a TaskGraph as a Java object, like this:

var taskGraph = new TaskGraph("multiply")
      .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)
      .task("vectorMul", Example::vectorMul, a, b, result)
      .transferToHost(DataTransferMode.EVERY_EXECUTION, result);

var snapshot = taskGraph.snapshot();
new TornadoExecutionPlan(snapshot).execute();

The team is also shipping a complete LLM inference library built with it in pure Java that provides LLM inference on GPUs, all in Java without external dependencies.

The just-shipped release v0.3.0 of GPULlama3.java brings significant performance and usability improvements.

~30% performance boost on NVIDIA GPUs (tokens/sec)
Optimized FP16 and Q8 kernel generation
Easier setup thanks to the new TornadoVM SDKs -- no complex GPU configuration
Run across NVIDIA PTX, OpenCL, and early Apple Silicon support
Enhanced Quarkus support
Integration with LangChain4j

GPULlama3.java currently supports several FP16 (16-bit floating point) and 8-bit quantized models, in the single-digit billions of parameters range:

Llama 3.2 (1B) – FP16
Llama 3.2 (3B) – FP16
Llama 3 (8B) – FP16
Mistral (7B) – FP16
Qwen3 (0.6B) – FP16
Qwen3 (1.7B) – FP16
Qwen3 (4B) – FP16
Qwen3 (8B) – FP16
Phi-3-mini-4k – FP16
Qwen2.5 (0.5B)
Qwen2.5 (1.5B)
DeepSeek-R1-Distill-Qwen (1.5B)

Depending on the selected model, a different execution plan will be built, corresponding to the relevant model architecture.

The project is led by the Beehive lab, which is part of the Advanced Processor Technologies Group at the University of Manchester, specializing in the codesign of combined hardware / software solutions.

The team has also developed TornadoInsight, a plugin for IntelliJ IDEA that enhances the developer experience when working with TornadoVM.

Future work on the roadmap includes moving the JNI components in the codebase to use the new FFM API instead.

About the Author

Ben Evans

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Ben Evans

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter