BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Optimizing Java for Modern Hardware: the Continuous Evolution of the Vector API

Optimizing Java for Modern Hardware: the Continuous Evolution of the Vector API

JEP 460, Vector API (Seventh Incubator), has been Closed / Delivered for JDK 22. This JEP, under the auspices of Project Panama, incorporates enhancements in response to feedback from the previous six rounds of incubation: JEP 448, Vector API (Sixth Incubator), delivered in JDK 21; JEP 438, Vector API (Fifth Incubator), delivered in JDK 20; JEP 426, Vector API (Fourth Incubator), delivered in JDK 19; JEP 417, Vector API (Third Incubator), delivered in JDK 18; JEP 414, Vector API (Second Incubator), delivered in JDK 17; and JEP 338, Vector API (Incubator), delivered as an incubator module in JDK 16. The most significant change from JEP 448 includes an enhancement to the JVM Compiler Interface (JVMCI) to support Vector API values.

This API aims to provide developers with a clear and concise way to express vector computations that can be compiled at runtime into optimal vector instructions on supported CPU architectures, thereby achieving superior performance compared to scalar computations. The latest iteration, proposed for JDK 22, includes minor enhancements and bug fixes, notably extending support for vector access to heap MemorySegments backed by an array of any primitive element type. This is a significant improvement over the previous limitation to byte arrays.

The core objectives of the Vector API are to offer a platform-agnostic interface that enables efficient implementation on multiple CPU architectures, including x64 and AArch64, and to ensure reliable runtime compilation and performance. This reliability is essential for developers to confidently use vector operations, expecting them to map closely to hardware vector instructions like Streaming SIMD Extensions (SSE), Advanced Vector Extensions (AVX) on x64, and NEON, SVE on AArch64. Furthermore, the API is designed to degrade gracefully on architectures that don't support certain vector instructions, ensuring functionality without compromising performance significantly.

To illustrate the concept, consider a traditional scalar approach for adding two arrays of integers:

public void scalarAddition(int[] a, int[] b, int[] result) {
    for (int i = 0; i < a.length; i++) {
        result[i] = a[i] + b[i];
    }
}

While straightforward, this approach operates on one pair of integers at a time, which is not optimal on modern CPUs with vector processing capabilities.

Now, the above method can be rewritten using the Vector API to leverage Single Instruction, Multiple Data (SIMD) capabilities for parallel computation:

import jdk.incubator.vector.IntVector;
import jdk.incubator.vector.VectorSpecies;

public void vectorAddition(int[] a, int[] b, int[] result) {
    final VectorSpecies<Integer> species = IntVector.SPECIES_PREFERRED;
    int length = species.loopBound(a.length);
    for (int i = 0; i < length; i += species.length()) {
        IntVector va = IntVector.fromArray(species, a, i);
        IntVector vb = IntVector.fromArray(species, b, i);
        IntVector vc = va.add(vb);
        vc.intoArray(result, i);
    }
    // Handle remaining elements
    for (int i = length; i < a.length; i++) {
        result[i] = a[i] + b[i];
    }
}

In this vectorized version, the IntVector class from the Vector API is used perform addition on multiple integers at once. The SPECIES_PREFERRED static field represents the optimal vector shape for the current CPU architecture, determining the number of integers processed in a single operation. The loopBound method calculates the upper bound for the loop to ensure safe vector operations within the array size. Inside the loop, Vectors va and vb are created from the input arrays and added together using the add method, which performs the addition in parallel across the vector lanes. The result is then stored back into the result array with the intoArray method. Any remaining elements not covered by the vector operations are processed using a scalar loop, ensuring all elements are correctly added.

The example above illustrates the API's power by comparing a scalar computation with its vectorized counterpart using the Vector API. The vectorized version demonstrates a significant performance improvement by operating on multiple elements in parallel, showcasing the potential for enhancing Java application performance in areas such as machine learning, cryptography, and linear algebra.

One of the long-term goals of the Vector API is to align with Project Valhalla, aiming to change the API's current value-based classes to value classes for enhanced performance and memory efficiency. This change will allow for the manipulation of class instances without object identity, significantly benefiting vector computations in Java.

The Vector API introduces an abstract class, Vector<E>, and its subclasses for specific primitive types to represent vectors and perform operations. These operations are classified as lane-wise, affecting individual elements, or cross-lane, affecting the vector as a whole. The API's design focuses on reducing its surface area while maintaining the flexibility to perform a wide range of vector operations, including arithmetic, conversions, and permutations.

The re-incubation of the Vector API in JDK 22 reflects an ongoing commitment to optimizing Java's performance capabilities in response to modern hardware advancements. This effort highlights the Java community's dedication to improving computational efficiency and developer productivity through continued innovation.


Editor's Note:
Minor cosmetic tweak was done.

About the Author

Rate this Article

Adoption
Style

BT