Project Jigsaw is Really Coming in Java 9
When Project Jigsaw will finally be released in Java 9, it will be a little over eight years old.
In its first years it had to compete with two similar Java Specification Requests, namely JSR 277 Java Module System, and JSR 294 Improved Modularity Support. It also caused a conflict with the OSGi community, which feared Project Jigsaw would be an unnecessary and inadequate duplicate of functionality that would force Java developers to use one of two incompatible module systems.
In its early years the project was not well staffed and even halted in 2010 during the merging of Sun into Oracle. It was not until 2011 that the dire need for a module system in Java was restated and work resumed with a full staff.
What followed was a three year exploratory phase, which ended in July 2014 when several Java Enhancement Proposals (JEP 200 Modular JDK, JEP 201 Modular Source Code, and JEP 220 Modular Run-Time Images) and ultimately JSR 376 Java Platform Module System were launched. The last-mentioned defines the actual Java module system that will be implemented in the JDK under a new JEP.
As of July 2015 the modules into which the JDK will be split are largely decided (see JEP 200), the JDK source code was restructured to accommodate them (see JEP 201) and the run-time images were prepared for modularization (see JEP 220). All of this is available in the current JDK 9 early access releases.
The code being developed as part of JSR 376 is expected to be deployed to the JDK repository soon, but as of yet there is unfortunately no way to experiment with the module system itself.
The motivation for Project Jigsaw changed slightly over its history. It was initially only intended to modularize the JDK. This scope was extended when it became clear that libraries and applications would benefit considerably from using this tool on their own code.
Ever-growing and Indivisible Java Runtime
The Java runtime has always been growing in size. But before Java 8 there was no way to install a subset of the JRE; all Java installations were distributed with libraries for such API’s as XML, SQL and Swing, whether you needed them or not.
While this may not be terribly significant for medium sized computing devices (for example desktop PCs or laptops) it is very significant for small devices like routers, TV-boxes, cars and all the other tiny nooks and crannies where Java is used. With the current trend of containerization it also gains new relevance on servers, where reducing an image’s footprint will reduce costs.
Java 8 brought compact profiles, which define three subsets of Java SE. These alleviated the problem somewhat but only in restricted cases, and the profiles are too rigid to cover all current and future needs for partial JREs.
JAR Hell and Classpath Hell are endearing terms referring to the problems that arise from the deficiencies of Java's class loading mechanism. Especially for large applications these can cause lots of pain in many interesting ways. Some of the problems build on one another; others are independent.
A JAR cannot express which other JARs it depends on in a way that the JVM will understand. Users are hence left to identify and fulfill the dependencies manually, by reading the documentation, finding the correct projects, downloading the JARs and adding them to the project.
Then there are optional dependencies, where a JAR might only require another JAR if the user wants to use certain features. This complicates the process further.
The Java runtime will not detect an unfulfilled dependency until it is actually required. This will lead to a
NoClassDefFoundError crashing the running application.
Build tools like Maven help solve this problem.
For an application to work it might only need a handful of libraries. Each of those in turn might need a handful of other libraries, and so on. As the problem is compounded it becomes exponentially more labor-intensive and error prone.
Again this is helped by build-tools.
Sometimes different JARs on the classpath contain classes with the same fully-qualified name, for example when they are two different versions of the same library. Since classes will be loaded from the first JAR on the classpath to contain them, that variant will "shadow" all others and make them unavailable.
If the variants differ semantically, this can lead to anything from too-subtle-to-notice-misbehavior to havoc-wreaking-errors. Even worse, the form in which this problem manifests itself can seem non-deterministic. It depends on the order in which the JARs are listed on the classpath. This may well differ across different environments, for example between a developer's IDE and the production machine where the code will eventually run.
This problem arises when two required libraries depend on different versions of a third library.
If both versions are added to the classpath, the behavior will be unpredictable. First, because of the shadowing problem, classes that exist in both versions will only be loaded from one of them. Worse, if a class that exists in one but not the other is accessed, that class will be loaded as well. Code calling into the library will hence find a mix of the two versions.
At best, the library code might fail loudly with a
NoClassDefFoundError if it tries to access code that does not exist in a loaded class. In the worst case, where versions only differ semantically, actual behavior may be subtly changed introducing hard-to-find bugs.
Identifying this as the source of unexpected behavior can be hard. Solving it directly is impossible.
Complex Class Loading
By default all classes are loaded by the same
ClassLoader. In some circumstances it might be necessary to add additional loaders, for example to allow users to extend the application by loading new classes.
This can quickly lead to a complex class loading mechanism that creates unexpected and hard to understand behavior.
Weak Encapsulation Across Packages
Java’s visibility modifiers are great to implement encapsulation between classes in the same package. But across package boundaries there is only one visibility: public.
Since a classloader folds all loaded packages into one big ball of mud, all public classes are visible to all other classes; there is no way to create functionality that is visible, for example, throughout a whole JAR but not outside of it.
An immediate consequence of weak encapsulation across package boundaries is that security relevant functionality will be exposed to all code running in the same environment. This means that malicious code can access critical functionality that may allow it to circumvent security measures.
Since Java 1.1 this was prevented by a hack: The
SecurityManager is invoked on every code path into security relevant code and checks whether the access is allowed. Or more precisely: it should be invoked on every such path. The omission of these calls in some places led to some of the vulnerabilities that plagued Java in the past.
Finally, it currently takes a while before the Java runtime has loaded and JIT compiled all required classes. One reason is that class loading executes a linear scan of all JARs on the classpath. Similarly, identifying all occurrences of a specific annotation requires the inspection of all classes on the classpath.
Project Jigsaw aims to solve the problems discussed above by introducing a language level mechanism to modularize large systems. This mechanism will be used on the JDK itself and is also available to developers to use on their own projects.
It is important to note that not all goals are equally important to the JDK and to us developers. Many are more relevant for the JDK and most will not have a huge impact on day-to-day coding (in contrast to recent language modifications like lambda expressions and default methods). Nonetheless, they will still change the way big projects are developed and deployed.
With the JDK being modularized, users will have the possibility to cherry pick the functionality they need and create their own JRE consisting of only the modules they require. This will help maintain Java’s position as a key player for small devices as well as for containers.
The proposed specification will allow the Java SE Platform, and its implementations, to be decomposed into a set of components which can be assembled by developers into custom configurations that contain only the functionality actually required by an application. - JSR 376
The specification will endow individual modules with the ability to declare their dependencies on other modules. The runtime will be able to analyze these dependencies at compile-time, build-time and launch-time and can thus fail fast for missing or conflicting dependencies.
One of the key goals of Project Jigsaw is to enable modules to only export specific packages. All other packages are then private to the module.
A class that is private to a module should be private in exactly the same way that a private field is private to a class. In other words, module boundaries should determine not just the visibility of classes and interfaces but also their accessibility. - Mark Reinhold – Project Jigsaw: Bringing the big picture into focus
Improved Security And Maintainability
The strong encapsulation of module internal APIs will greatly improve security because critical code is now effectively hidden from code that does not need to use it. It also makes maintenance easier as a module’s public API can more easily be kept small.
Casual use of APIs that are internal to Java SE Platform implementations is both a security risk and a maintenance burden. The strong encapsulation provided by the proposed specification will allow components that implement the Java SE Platform to prevent access to their internal APIs. - JSR 376
With clearer bounds of where code is used, existing optimization techniques can be used more effectively.
Many ahead-of-time, whole-program optimization techniques can be more effective when it is known that a class can refer only to classes in a few other specific components rather than to any class loaded at run time. - JSR 376
Since modularization is the goal, Project Jigsaw will introduce the concept of modules, which are:
named, self-describing program components consisting of code and data. A module must be able to contain Java classes and interfaces, as organized into packages, and also native code, in the form of dynamically-loadable libraries. A module’s data must be able to contain static resource files and user-editable configuration files. - Java Platform Module System: Requirements (DRAFT 2)
To give modules some context, think of well-known libraries such as Google Guava or the ones in Apache Commons (e.g. Collections or IO) as modules. Depending on how granular their authors want to split them, each of those might themselves be divided into several modules.
The same is true of an application. It might be a single monolithic module but it might also be split up. A project’s size and cohesion will be important factors for deciding on how to split it into modules.
The plan is that modules will become a regular tool in a developer’s box to organize her code.
Developers already think about standard kinds of program components such as classes and interfaces in terms of the language. Modules should be just another kind of program component, and like classes and interfaces they should have meaning in all phases of a program’s development. - Mark Reinhold’s – Project Jigsaw: Bringing the big picture into focus
Modules can then be combined into a variety of configurations in all phases of development, i.e. at compile-time, build-time, install-time or run-time. They will be available to Java users like us (in that case sometimes called developer modules) but they will also be used to dissect the Java runtime itself (then often called platform modules).
In fact, this is the current plan for how the JDK will be modularized.
(Click the image to enlarge it)
In order to solve “JAR/Classpath hell” one of the core features of Project Jigsaw is dependency management. Let’s look into the components.
Declaration And Resolution
It will also be possible to depend not on specific modules but on a set of interfaces. The module system will then try to identify modules that implement these interfaces and thus satisfy the dependency, binding them appropriately to the interface.
Modules will be versioned. They will be able to indicate their own version (in pretty much any format as long as it is totally ordered) as well as constraints for their dependencies. It will be possible to override both of these pieces of information in any phase. The module system will enforce during each phase that a configuration satisfies all constraints.
Project Jigsaw will not necessarily support multiple versions of a module within a single configuration. But wait, then how does this solve JAR Hell? Good question!
Version selection - the act of selecting the appropriate version from a set of different versions of the same module - is not mandated by the specification. So when I wrote above that the module system will identify the modules required to compile or run another module, this was based on the assumption that there is only one version of each. In case there are several, an upstream step (e.g. the developer or, more likely, the build tool he uses) must make a selection, and the system will only validate that it satisfies all constraints.
The module system will enforce strong encapsulation in all phases. This centers around an export mechanism where only a module’s exported packages are accessible. Encapsulation is imposed independently of the security verification tasks performed by any
SecurityManager that may be present.
The exact syntax for the proposal is not yet defined, but JEP 200 provides some XML renditions of the main semantics. As an example the following is the declaration of the
<module> <!-- The name of this module --> <name>java.sql</name> <!-- Every module depends upon java.base --> <depend>java.base</depend> <!-- This module depends upon the java.logging and java.xml modules, and re-exports their exported API packages --> <depend re-exports="true">java.logging</depend> <depend re-exports="true">java.xml</depend> <!-- This module exports the java.sql, javax.sql, and javax.transaction.xa packages to any other module --> <export><name>java.sql</name></export> <export><name>javax.sql</name></export> <export><name>javax.transaction.xa</name></export> </module>
We can see from this snippet that java.sql depends on j
ava.base, java.logging, and
java.xml. After covering the different export mechanisms we will understand the rest of the declaration.
A module will declare specific packages for export, and only the types contained in them will be exported. This means that only they will be visible and accessible to other modules. Even stricter, the types will only be exported to those modules which explicitly depend on the module containing them.
In the example above,
java.sql exports the packages
It will also be possible for one module to re-export the API (or parts thereof) of any other module it depends upon. This will support refactoring by providing the ability to split and merge modules without breaking dependencies because the original ones can continue to exist. They will export the exact same packages as before even though they might not contain all the code. In the extreme case so-called aggregator modules could contain no code at all and act as a single abstraction of a set of modules. In fact, the compact profiles from Java 8 will be exactly that.
We can see from the example that
java.sql re-exports the APIs of its dependencies
To help developers (especially those modularizing the JDK) with keeping exported API surfaces small, an optional qualified export mechanism will allow a module to specify additional packages to be exported exclusively to a declared set of modules. So while with the “standard” mechanism the exporting module won’t know (nor care) who accesses the packages, using qualified exports will allow a module to limit the set of possible dependents.
Configuration, Phases, And Fidelity
As mentioned earlier, a goal of JEP 200 is that modules can be combined into a variety of configurations in all phases of development. This is true for the platform modules, which can be used to create images identical to the full JRE or JDK, the compact profiles introduced in Java 8, or any custom configuration which contains only a specified set of modules (and their transitive dependencies). Similarly, developers can use the mechanism to compose different variants of their own modularized applications.
At compile-time, the code being compiled will only see types that are exported by a configured set of modules. At build-time, a new tool (presumably to be called JLink) will allow the creation of binary run-time images that contain specific modules and their dependencies. At launch-time, an image can be made to appear as if it only contains a subset of its modules.
It will also be possible to replace modules that implement an endorsed standard or a standalone technology with a newer version in each of the phases. This will replace the deprecated endorsed standards override mechanism and the extension mechanism (see below).
All module-specific information (like versions, dependencies and package export) will be expressed in code files, independent of IDEs and build tools.
Whole-Program Optimization Techniques
Within a module system with strong encapsulation it is much easier to automatically reason about all the places where a specific piece of code will be used. This makes certain program analysis and optimization techniques more feasible:
Fast lookup of both JDK and application classes; early bytecode verification; aggressive inlining of, e.g., lambda expressions, and other standard compiler optimizations; construction of JVM-specific memory images that can be loaded more efficiently than class files; ahead-of-time compilation of method bodies to native code; and the removal of unused fields, methods, and classes. - Project Jigsaw: Goals & Requirements (DRAFT 3)
These are labeled whole-program optimization techniques and at least two such techniques will be implemented in Java 9. It will also contain a tool which analyzes a given set of modules and applies these optimizations to create a more performant binary image.
Auto discovery of annotated classes (like for example Spring annotated configuration classes) currently requires scanning all classes in some specified packages. This is usually done during a program’s startup, and can slow it down considerably.
Modules will have an API allowing callers to identify all classes with a given annotation. One envisioned approach is to create an index of such classes that will be created when the module is compiled.
Integration With Existing Concepts And Tools
Diagnostic tools (e.g. stack traces) will be upgraded to convey information about modules. Furthermore, they will be fully integrated into the reflection API, which can be used to manipulate them in the same manner as classes. This will include the version information that can be reflected on and overridden at runtime.
The module’s design will allow build tools to be used for them “with a minimum of fuss”. The compiled form of a module will be usable on the classpath or as a module so that library developers are not forced to create multiple artifacts for class-path and module-based applications.
Interoperability with other module systems, most notably OSGi, is also planned.
The module system is designed with package manager file formats “such as RPM, Debian, and Solaris IPS” in mind. Not only will developers be able to use existing tools to create OS-specific packages from a set of modules. Such modules will also be able to call other modules that were installed with the same mechanism.
Developers will also be able to package a set of modules which make up an application into an OS-specific package, “which can be installed and invoked by an end user in the manner that is customary for the target system”. Building on the above, only those modules that are not present on the target system must be packaged.
Running applications will have the possibility to create, run, and release multiple isolated module configurations. These configurations can contain developer and platform modules. This will be useful for container architectures like IDEs, application servers, or the Java EE platform.
As usual for Java these changes are implemented with a strong focus on backward compatibility; all standardized and non-deprecated APIs and mechanisms will continue to function. But projects might depend on other, undocumented constructs in which case their switch to Java 9 will require some work.
Internal APIs Become Unavailable
With strong encapsulation every module will be able to explicitly declare which types are made available as part of its API. The JDK will use this feature to properly encapsulate all internal APIs which will hence become unavailable.
This may turn out to be the biggest source of incompatibilities with Java 9. It surely is the least subtle one as it causes compile errors.
So what are internal APIs? Definitely everything that lives in a
sun.*-package. If it’s in
com.sun.* and annotated with
@jdk.Exported, it will still be available on Oracle JDKs; if it has no annotation it will be unavailable
One example that might prove especially problematic is
sun.misc.Unsafe . It is used in quite a number of projects for mission and performance critical code and its pending inaccessibility has stirred up quite a discussion. During one such exchange it was pointed out, though, that it will still be available via a dedicated command-line flag. This might be a necessary evil, considering that not all functionality will find its way into a public API.
Another example is everything in
com.sun.javafx.*. Those classes are a crucial ingredient to properly building JavaFX controls and is also needed to work around a number of bugs. Most functionality from these classes is targeted for publication.
Merge Of JDK And JRE
With a scalable Java runtime, which allows the flexible creation of runtime images, the JDK and JRE lose their distinct character and become just two possible points in a spectrum of module combinations.
This implies that both artifacts will have the same structure, which includes the folder structure and any code that relies on it (e.g. by utilizing the fact that a JDK folder contains a subfolder jre) will stop working correctly.
Internal JARs Become Unavailable
Internal JARs like
lib/tools.jar will no longer be accessible. Their content will be stored in implementation-specific files with a deliberately unspecified and possibly changing format.
Any code that assumes the existence of these files will stop working correctly. This might also lead to some transitional pains in IDEs or similar tools that heavily rely on these files.
New URL Schema For Runtime Image Content
Some APIs return URLs to class and resource files in the runtime (e.g. ClassLoader.getSystemResource). Before Java 9 these are jar URLs and they have the following form:
Project Jigsaw will use modules as a container for code files and the individual JARs will no longer be available. This requires a new format so such APIs will instead return jrt URLs:
Code that uses the instances returned by such APIs to access the file (e.g. with URL.getContent) will continue to work as today. But if it depends on the structure of jar URLs (e.g. by constructing them manually or parsing them), it will fail.
Removal of the Endorsed Standards Override Mechanism
Some parts of the Java API are labeled “Standalone Technologies” and created outside of the Java Community Process (e.g. JAXB). It might be desirable to update those independently of the JDK or use alternative implementations. The endorsed standards override mechanism allows the installation of alternative versions of these standards into a JDK.
This mechanism is deprecated in Java 8 and will be removed in Java 9, to be replaced by the upgradeable modules mentioned above.
Removal of the Extension Mechanism
With the extension mechanism custom APIs can be made available to all applications running on the JDK without having to name them on the classpath.
This mechanism is deprecated in Java 8 and will be removed in Java 9. Some features that are useful on their own will be retained.
We have glanced at the history of Project Jigsaw, saw what motivated it, and discussed its goals as well as how they are going to be implemented by specific features. What else can we do besides wait for Java 9?
We should prepare our projects and examine whether they rely on anything that will be unavailable or removed in Java 9.
At least dependencies on internal APIs don’t have to be searched manually. Since Java 8 the JDK contains the Java Dependency Analysis Tool JDeps (introduction with some internal packages, official documentation for Windows and Unix), which can list all packages upon which a project depends. If run with the parameter -
jdkinternals, it will output almost all internal APIs a project uses.
“Almost all” because it does not yet recognize all packages that will be unavailable in Java 9. This affects at least those which belong to JavaFX, as can be seen in JDK-8077349. (Using this search I could not find other issues regarding missing functionality.)
There are also at least three JDeps-plugins for Maven: by Apache, Philippe Marschall and myself. The latter is currently the only one which fails the build if
jdkinternals reports dependencies on internal APIs.
If you are concerned about some specific API that will be unavailable in Java 9, you could check the mailing list of the corresponding OpenJDK project as these will be responsible for developing public versions of them.
Java 9 early access builds are already available. While JSR 376 is still under development and the actual module system is not yet available in those builds, many preparatory changes are. In fact, anything besides strong encapsulation is already in place.
Information gathered this way can be returned to the project by posting it on the Jigsaw-Dev mailing list. To quote the (almost) final words of JEP 220:
It is impossible to determine the full impact of these changes in the abstract. We must therefore rely upon extensive internal and—especially—external testing. […] If some of these changes prove to be insurmountable hurdles for developers, deployers, or end users then we will investigate ways to mitigate their impact.
There is also the global Java User Group AdoptOpenJDK, which is a great contact for early adopters.
About the Author
Nicolai Parlog is a software developer and Java enthusiast. He constantly reads, thinks and writes about Java, and codes for a living as well as for fun. He's a long tail contributor to several open source projects and blogs about software development on CodeFX. You can follow Nicolai on Twitter.
Unsafe and compiler (tools)...
Otherwise - of course - modularity is dearly missing. Package-level vs public access is a big gap that needs something in between. I can't wait for module encapsulation. :-)