BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Helidon 4 Adopts Virtual Threads: Explore the Increased Performance and Improved DevEx

Helidon 4 Adopts Virtual Threads: Explore the Increased Performance and Improved DevEx

Key Takeaways

  • Helidon 4 introduces support for Java 21's virtual threads from Project Loom, aiming to improve performance and simplify concurrent programming, marking a significant update in Java development.
  • The release of Helidon 4 represents a shift in coding paradigms, transitioning from the complex reactive model to a more straightforward imperative model that uses "blocking" calls. This change is designed to simplify the coding process and make debugging easier.
  • We claim that Helidon 4 achieves improved performance compared to previous versions and some external frameworks. It aims to provide this enhanced performance while maintaining simplicity in its design and use.
  • The written-from-scratch Web Server in Helidon 4, previously named Níma, includes support for multiple protocols and features for observability. This design aims to make Níma a versatile tool for developers.
  • Helidon 4 presents developers with an opportunity to balance performance with simplicity, addressing the historical dilemma. It positions itself as a framework that supports a new phase of efficient and lightweight microservice development.

The Helidon team follows a philosophy of utilizing the latest Java features to gain performance and embrace new trends. Helidon 2 saw the introduction of modularity aligned with the Long-Term Support (LTS) release of Java 11.

With Helidon 3, the team incorporated features like switch expressions, records, and sealed classes into both the codebase and the public APIs.

Java 21 introduces many new features, including virtual threads, stemming from Project Loom, which is well-known to Java developers.

Java virtual threads, delivered through Project Loom, is an initiative in the Java programming language to introduce lightweight, user-mode threads that can be managed more efficiently by the Java Virtual Machine (JVM). These virtual threads are sometimes referred to as "Fibers" in other programming languages.

Unlike traditional Java threads (created using the Thread class), virtual threads are much lighter regarding memory usage and overhead. They are designed to be efficient to create and manage.

Virtual threads are managed at the user level rather than the operating system level. This means the JVM can create and switch between them without involving the operating system's thread scheduler. This reduces the context-switching overhead, making them more suitable for highly concurrent applications.

Java virtual threads are suitable for applications with a large number of concurrent tasks or I/O-bound operations. They can be created in the thousands or even millions without consuming excessive system resources.

Virtual threads simplify concurrent programming by making writing and reading code that involves asynchronous or parallel tasks easier. Developers can use them without the complexity of managing traditional thread pools or dealing with explicit thread synchronization.

Because of their lightweight nature and reduced context-switching overhead, Java virtual threads can improve application performance, especially in scenarios with many concurrent tasks or when dealing with I/O operations. This aligns with the typical run profile of Helidon, where there are numerous incoming requests that are short-lived and do not block threads.

Java 21 aims to make virtual threads seamlessly compatible with existing Java libraries and frameworks. Developers have the option to migrate their code to use virtual threads gradually without rewriting everything. Virtual threads are designed to work with existing Java constructs, including the java.util.concurrent package, allowing their use in conjunction with traditional threads when required.

Java virtual threads represent a significant enhancement to the Java ecosystem, particularly for building highly concurrent and scalable applications. They are intended to simplify the development of applications that involve parallelism and asynchronous I/O operations, making it easier for developers to write efficient and responsive software.

Tomas Langer, Architect at Oracle, considered the capabilities of virtual threads and concluded that they are a well-suited match for the tasks at hand. Web servers usually have many running tasks to serve requests running in parallel. They are often blocked with some short IO, Database, or network calls running for some time but not obstructed.

Helidon 1.x - 3.x

To deal with those short blocking pauses, we have used the asynchronous Netty and built on top of our own Helidon Reactive framework (with contributions from David Karnok).

We have created this engine and built around it all the required infrastructure to create awesome microservices in a reactive way. This collection of libraries delivers Helidon Reactive Server and is included in Helidon SE.

On top of it, we decided to build a compatibility layer with MicroProfile.

MicroProfile is an open-source, community-driven project aimed at enhancing the development of microservices-based applications in the Java ecosystem. It uses Contexts and Dependency Injection (CDI) at the core of its programming model, and this is a familiar concept for Enterprise Java developers. It provides a set of specifications, APIs, and tools that simplify the development, deployment, and management of microservices. MicroProfile emerged as a response to the growing popularity of microservices architecture and the need for Java developers to have standardized tools and APIs for building microservices applications.

MicroProfile is designed to be lightweight and compatible with other Jakarta EE technologies, making it a suitable option for developers who aim to build scalable and resilient microservices applications in the Java ecosystem. It encourages an open and collaborative community where developers and organizations can contribute to and benefit from standardized microservices development practices.

This flavour is known as Helidon MP.

There are significant differences between Helidon SE and Helidon MP. The first one is reactive and asynchronous, and the second one is standard, annotation-based, and synchronous.

So technically, Helidon provides two coding models: reactive for tasks that require maximum performance and Helidon MP when we need full compatibility.

However, reactive programming presents complexities. The practice has shown that the reactive coding model poses challenges in writing, reading, and debugging. We used it primarily for tasks where performance is a priority. For the most common tasks in which maximum performance is not necessary, the synchronous capabilities of Helidon MP are enough.

So, the choice between performance and simplicity has been a consistent consideration: Helidon Reactive for tasks that require high performance and Helidon MP for standard jobs.

Until now!

Project Níma

The first experiments with the written-from-scratch Helidon Web Server showed that performance comparable to Netty could be achieved using virtual threads and a blocking paradigm!

So, we decided to keep working on this project! As a result, Helidon 4 now has its one written-from-scratch virtual threads-based web server! Since virtual threads are no longer experimental, Helidon 4 is entirely production-ready.  

The codename for the Helidon Web Server was chosen to be Níma, which means "Thread" in the Greek language.  

Previously, Helidon SE provided reactive fluent APIs for creating Microservices; all those APIs have been rewritten in a fully "blocking" imperative way. As a result, much complexity has been removed from the APIs and the implementations themselves. The codebase has become more compact and significantly more secure, as there are fewer dependencies.

As mentioned before, for Helidon, we aim to use the latest and most excellent features of Java language, so, currently, we have an extensive use of records, switch expressions, and sealed classes. The last ones are particularly beneficial for API design.

With this in mind, we have completely rewritten from scratch the Helidon Web Server and Helidon SE. Now we have a brand new Web Server that supports the following features out-of-the-box:

  • HTTP/1.1 protocol with full pipelining support
  • HTTP/2 protocol
  • GRPC protocol
  • WebSocket protocol
  • Unit and integration testing support (own framework)
  • TLS and mTLS - ALPN for both HTTP/1.1, and HTTP/2
  • Other protocols (even non-HTTP)
  • Access Log support
  • CORS support
  • Static Content
  • OpenTelemetry tracing
  • Observability features

Now, for each open port, there is a "real" thread running. Whenever a new request comes, a new virtual thread starts, and all request processing happens inside this thread. The JVM takes care of mounting and unmounting this thread to carrier threads. So, for example, if a request processing requires some I/O operations or a Database call, the thread is technically blocked while waiting for results to come. Still, since this thread is virtual, the JVM unmounts it from the carrier thread, giving the resources to other requests. Whenever the results from I/O or DB calls are available, the thread is unblocked, and the JVM scheduler will find a free carrier thread and mount the virtual thread on this free carrier thread. Now, virtual threads are extremely cheap resources, and we can have millions of them, and switching between them is not costly anymore!

Helidon 4 Web Server typically has one new virtual thread for each HTTP/1.1 request, two virtual threads for HTTP/2 request (one for streaming), and all the routing is done using virtual threads.

All the advancement described above is achieved using the capabilities of JDK 21 without relying on any unconventional methods or workarounds.

Helidon 4 architecture looks this way:

As you see, there is no Netty anymore. One big dependency is removed! With Helidon virtual threads-based Web server, we are now much faster and more secure!

Coding Paradigm

Consider the following code snippet written using a reactive paradigm:

Given the complexity and depth of the above code, it may take a considerable amount of time to read and fully understand it. This even poses challenges in debugging it. However, this sort of trade-off to gain a performance benefit isn’t uncommon in industry practices, as it is deemed to be an efficient code.

Now, let's consider the following code snippet, which I have rewritten using virtual threads but serves the same purpose.

This code presents better readability and is quicker to write and readily debuggable using the debugger in any preferred IDE. Additionally, it is designed to perform optimally and even potentially outperform the alternatives since it is executed using virtual threads.

Performance

Besides the obvious benefits like ease of coding, one of the major benefits is, of course, the performance. Our local runs show that the performance is even better compared to pure asynchronous Netty.

And if we compare Helidon 3 MP performance to Helidon 4 MP performance – it is measured in magnitudes.

This means that developers just have to update their version of Helidon MP, and there will be a significant performance increase without having to change anything in their code!

If we take external frameworks comparison results, we can see that Helidon is winning, too!

Source here.

The results are preliminary, yet they provide insightful independent comparison details, indicating Helidon's strong performance.

All of this is achieved using JDK-only features and functionality without any platform-specific optimizations or workaround.

Lessons learned

Adopting virtual threads was challenging, but we had great support from the Java Platform Team and the Performance engineers at Oracle. Helidon 4 was developed simultaneously as virtual threads were, so we had to deal with occasional changes. This process yielded several insights.

Due to the sheer volume of virtual threads, employing a single component for caching buffers for reuse has proven inefficient. We have achieved better throughput by discarding these buffers (allowing the Garbage Collector to perform its task) than by attempting reuse. Similar results were observed with both native byte buffers and heap byte buffers.

We deliberated over whether to employ locking or non-blocking sockets/socket channels. After extensive testing and validation with our Java team, we concluded that the best performance is achieved with blocking sockets. For instance, we use the ServerSocket in blocking mode to listen for connections and adopt a "traditional" approach of accepting a socket and initiating a new thread to handle it (with the understanding that these threads are virtual).

By default, we perform asynchronous writes to sockets. On Linux, this method significantly enhances performance when using HTTP/1.1 pipelining, yielding up to a 3x improvement. However, in the absence of pipelining, this approach offers no additional benefit. Consequently, we have made asynchronous writes configurable, allowing them to be disabled as needed.

Implementing HTTP/2 presented its own set of challenges. Nevertheless, we successfully developed a method to provide unified HTTP routing, applicable regardless of the version, thereby supporting version-specific routes. The only complex threading scenario that arises is in connection/stream interaction, which can potentially lead to race conditions.

Regarding gRPC, our move away from requiring Netty gRPC has enabled us to serve it on the same port as other protocols, thus streamlining our service architecture.

There are some situations when The virtual thread is not released from the carrier thread – when it is pinned. This happens when we use Synchronized blocks as well as Native methods and foreign functions. These operations may hinder application scalability, as the scheduler does not compensate by expanding its parallelism. These operations should be guarded using ReentrantLock and other constructs from java.util.concurrent instead of synchronized.

In our code, all of these lessons were applied. With the help of the Java Core Team at Oracle, the performance of Helidon is optimized at maximum levels.

Conclusion

The architectural lead of Java virtual threads, Tomas Langer, regards them as a significant enhancement to Java; a sentiment that is echoed by the Helidon team.

Before virtual threads, developers faced a choice between performance and reactive complexity versus the simplicity of the development/maintenance and average performance.

Now, the JVM takes care of thread management and optimizes the performance. Consequently, the Helidon codebase has become more compact, maintainable, and inherently more secure (no more Netty CVEs).

Helidon 4 allows for the creation of microservices using an imperative synchronous "blocking" programming model, resulting in code that is easy to read, maintain, and debug while also offering optimal performance on par with reactive programs.

The biggest challenge now is "to unlearn reactive"!

Many other useful new features are coming with the following minor versions of Helidon 4, and they are worth separate big articles, so stay tuned!

To stay informed about Helidon, developers can leverage the official accounts on Twitter, Mastodon, and LinkedIn. For additional inquiries, the Helidon tag on Stack Overflow is available as a resource.

Enjoy creating powerful and lightweight microservices with Helidon 4!

About the Author

Rate this Article

Adoption
Style

BT