Java 7 Module System Concerns
The new Java module system has received a lot of attention lately. After watching a presentation on Devoxx about project Jigsaw, I was excited, thinking that it could be a solution to complicated classpaths versioning issues, JAR hell, etc. Developers could finally be able to use whatever version of Xalan they wanted without being forced to use endorsed mechanism. Unfortunately the road to a more efficient module system is not that clear.
Let's take a look at some basics concepts before examining the actual issues:
Modularization is great tool for tackling complexity. It is useful to divide an application into parts (modules, libraries, bundles, sub-projects, components) and conquer them separately. The ultimate goal of modularization is to have a defined set of APIs that are used for communication between modules.
If all the inter-module communication is realized only by using this APIs the modules are loosely coupled, so:
- it is easy to change the implementation of a module and
- it is easy to develop and test the modules separately.
It is analogous to object oriented paradigm. In OOP the ideal situation is to have lot of small, reusable, simple and well separated objects. In module system it is ideal to have small, reusable, simple and well separated modules. The idea and the motivation is exactly the same only the scale is different.
Traditionally there are two approaches on how to achieve modularity in Java. Logical separation is the most natural way. It consists of splitting application into logical modules (sub-projects) but deploying them as one application. It is possible to accomplish logical separation only by defining correct package structure but it is more common to split an application into several archives (JARs). Logical separation facilitates reuse of modules and helps achieving lower coupling between them. It is even possible to define an API and declare that all communication between modules has to be realized only by using given API. This concept has one big fault. It is hard to impose this restriction. There is no mechanism to enforce the API usage. There is no way to distinguish classes that should be used only by given module and classes that are part of the public API. If a class is "public", it can be used by every other class no matter which module it is part of. On the other hand, protected or package visibility is too constraining for use inside the module. Usually the module consists of several packages and classes in the packages need to be able to invoke each other. So even though an application consists from several logical modules, the modules are usually so coupled that the separation is almost useless.
Another traditional approach is physical separation. It is possible to enforce the separation by splitting an application into components and by deploying each component into separate JVM. Components then communicate using remoting facilities like RMI, CORBA or WebServices. This way the separation and loose coupling is enforced. The downside is big overhead. Using remoting just to enforce separation is overkill. It makes development and deployment unnecessary complicated. Performance impacts are also not negligible.
Module system stands somewhere in between logical and physical separation. It enforces module separation, but the modules are deployed into the same JVM and communication between them consists of plain old method calling. Therefore there is no runtime overhead. The most popular module framework in Java ecosystem is OSGi. It is a mature specification with several implementations. In OSGi modules are called bundles, every bundle is equivalent to one JAR. Every bundle also contains a META-INF/MANIFEST.MF file that declares which packages are exported and which packages are imported. Only classes from exported packages can be used by other bundles, every other package in the bundle is internal and its classes can be used only within the bundle.
For example consider following declaration:
Manifest-Version: 1.0 Import-Package: net.krecan.spring.osgi.common Export-Package: net.krecan.spring.osgi.dao Bundle-Version: 1.0.0 Bundle-Name: demo-spring-osgi-dao Bundle-SymbolicName: net.krecan.spring-osgi.demo-spring-osgi-dao
It specifies bundle demo-spring-osgi-dao that requires classes from net.krecan.spring.osgi.common package and exports classes from net.krecan.spring.osgi.dao package. In other words, the declaration says that other modules can use only net.krecan.spring.osgi.dao package. Conversely, this module only needs to use net.krecan.spring.osgi.common package and it is up to OSGi to provide a module that exports the package. Of course it is possible to declare more then one package in both import and export declaration.
The important thing to notice is that modularity of OSGi is build on top of Java. It is not part of the language! Module separation is not enforced by the compiler although it can be enforced by GUI. An OSGi container is needed to run OSGi based application. The container can be part of the runtime environment like in Spring DM server or it can be embedded in the application. The container not only enforces the separation but also provides other services like security, module management and life-cycle management. OSGi also provides lots of other interesting features, but they are out of scope of this article.
There has been lots of controversy regarding the proposal of JSR-277 which partially duplicated OSGi. For many months experts from both sides advocated about which one was better, until it was announced that JSR-277 was abandoned and new module system which should be part of Java 7 was introduced.
First part of the new module system is JSR-294 aka superpackages. This specification makes the concept of modules part of Java language.
JSR-294 introduces new visibility keyword "module". If a member has this visibility it means that it is only visible by members of the same module. It enables creation of an internal API that is meant to be used only by the module itself. As I see it, "public" keyword should be used only when declaring a public API. In all other cases "module" or more constraining visibility should be used. Of course, once there is "module" keyword in the language, visibility constrains between modules will be checked by the compiler.
JSR-294 will also allow dependency definition. It will be possible to define that one module depends on another module in a given version. For example:
//org/netbeans/core/module-info.java @Version("7.0") @ImportModule(name="java.se.core", version="1.7+") module org.netbeans.core;
The latter means that module "org.netbeans.core" depends on "java.se.core" version 1.7 and higher. It is equivalent to Maven dependencies or OSGi imports. You should probably ignore the syntax, since it will probably change. The important thing here is that module dependencies are defined in module-info.java file and will be compiled into a class file. In OSGi, the dependencies are defined in a plain text file.
Project Jigsaw is second part of the proposed module system. I assume that it will be Sun specific implementation of JSR-294. But it will also be modularization of Sun JDK. Since there is a need to make the monolithic JDK modular, Sun wants to split standard libraries into modules. This will allow the facilitation of profiles directly in the JRE. It would be possible to have full JRE on mobile phone containing everything except Swing. It would also be possible to introduce new standard APIs into language without having to wait for new release of the whole platform. It sure looks promising!
But here also comes my first concern. The line between proprietary Jigsaw and the JSR standard is not clear, as Mark Reinhold's notices.
This effort [Jigsaw] will, of necessity, create a simple, low-level module system whose design will be focused narrowly upon the goal of modularizing the JDK. This module system will be available for developers to use in their own code, and will be fully supported by Sun, but it will not be an official part of the Java SE 7 Platform Specification and might not be supported by other SE 7 implementations.
This statement is unclear and leaves room for interpretation. Does it mean that it will be possible to create modules but use them only in Sun JRE? Does it mean, that if a developer will write '@ImportModule(name="java.se.core", version="1.7+") ', it would work on Sun JRE but might not be supported by IBM JRE? Does it mean that Sun will split its JRE in one way and Oracle in other way? Let's hope not for the sake of the "write once, run anywhere" principle.
The issue seems to be even deeper. It is not clear what is the main objective of project Jigsaw. It is mentioned that the main goal is modularization of Sun JRE, but in this case there are no language changes needed. Sun can modularize its JRE without changing Java as a language.
Could these language changes be just a byproduct of the Sun JRE modularization? If this is true, it's wrong! Language change has to be a first class citizen, and not a byproduct of some proprietary effort.
My other concern is about dependencies. If the module system manages dependencies, classpath is not needed any more. On one hand that's great. Classpath often leads to so called JAR hell. On the other hand classpath is extremely flexible. I am afraid, that is not possible to replace classpath by a static module dependencies. Let's see why is that:
In Java there are two classpaths. There is a buildpath, which is used at build time and then there is a classpath which is used at runtime. They are almost identical, but not completely. Classical example is JDBC driver. It is not necessary to specify JDBC driver at build time. JDBC interfaces are part of core Java library. But it is necessary to have a JDBC driver in the classpath at runtime. Nowadays, when a programmer wants to change the database, he just changes driver class name in a configuration file, adds driver jar file to the classpath and that's all. He can't do this if all the dependencies have to be specified at compile-time! Of course in Java EE he can use JNDI data source, but there is nothing similar in Java SE and it is not a viable solution to recompile the application every time the JDBC driver has to be changed.
Often the recompilation is not even possible. In some organizations the final application is assembled from modules by someone called Application Assembler. He does not have the source code, he just puts JARs together, changes configuration files and creates the final package. Application Assembler role is even specified in Java EE specification.
Similar problem are optional dependencies. Lets pretend that we are working on a logging framework like log4j. This library is able to log over JMS, therefore JMS packages have to be in the build path. But 99% of the users are not using JMS logging, so they do not need the dependency in their classpath. There has to be some mechanism that deals with such situations. A library is needed in order to build the module, but this dependency is optional for the end users. Of course, in a perfect world, the JMS functionality would be in a separate module, but we do not live in a perfect world and sometimes it is not practical to split a project in such way.
Another big issue are dependency conflicts. If you work with Maven, you know what I am talking about. Average enterprise application consists of dozens third party libraries, all of them have dependencies and sometimes those dependencies are in conflict. For example, a developer wants to use Hibernate that depends on commons-collections 2.1.1. He also wants to use commons-dbcp that depends on commons-collections 2.1. The developer or application assembler has to decide what to do in such situation. He can either decide that he wants to use only one specific version of the library everywhere in the application, or he can decide that it is satisfactory to use different versions in different parts of the application. The important thing is that this situation cannot be resolved automatically. It has to be decided by someone who knows how the module is used in a given application and recognizes possible incompatibilities between versions.
There are a lot of things that can be said about Java dependencies that are beyond the scope of this article, but the main point to keep is that they are not static! An application can be built with one set of libraries and run with completely different set. Every modular system has to deal with this situation one way or the other. Maven has lot of configuration options on how to configure dependencies, how to deal with dependency conflicts etc. But it is still just a build system. In the worst case it is still possible to configure the classpath manually. OSGi is in the opposite situation. It deals only with runtime (deploy-time) dependencies, but it doesn't deal with build-time. The new Java module system will be used for both build and runtime (I assume). It brings even more complexity to an already complex problem.
Of course, I do not think that Sun engineers want to break Java. I know that they want to make Java better and easier to use, but I am just afraid that political and marketing reasons will be stronger than technical. For one more time, it is not just an API change or Sun specific change. It will be a language change! And once the language is changed, once the "module" keyword is added, there is no way back. There will be a module system in Java and we will have to use it whether we like it or not. It is hard to imagine a situation where there will be a modular JVM, "module" keyword in the language, and we will still be using OSGi on top of it.
Lukas Krecan is a freelance Java EE developer. He is working for real world corporations but in his free time he is blogging about programming in an imaginary perfect world.
Mark Reinholds Devoxx presentation
logj4 / Service Provider
Also, unrelated to the log4j point, have you ever used/considered the Service Provider which has been part of the JAR Specification since 1.3?
At compile time the program is only dependant on an interface, which is obviously good practice. You only have to add a Jar that specifies that it contains a class which implements the interface and Java will automatically make it available to you when you do a look up.
OSGi solves your concerns about "static" dependency systems
Firstly, OSGi is NOT a "static" dependency system. Remember it is the "Dynamic Module System for Java" (that's the title of JSR 291). What you refer to as deploy time dependencies are solved in OSGi through the use of Services: by writing your code to depend on interfaces rather than concrete implementations, you can substitute implementations at runtime without recompiling. This problem has been solved outside of OSGi also with DI frameworks like Spring, but OSGi does it in a more dynamic way, allowing for hotswapping of implementations without taking down the JVM.
Optional dependencies are indeed a problem, especially with legacy libraries like log4j that are not well-factored. This is why OSGi explicitly supports optional dependencies. Simply mark your dependency with the attribute "resolution:=optional".
Dependency conflicts... well, this is a strong argument for using a module system like OSGi! All modules in OSGi are versioned, and dependencies can be versioned too. So you can have both versions of commons-collections in your JVM at the same time, and some modules will resolve version 2.1.1 and others against 2.1.0. This is a scenario that absolutely cannot be handled with the old classpath approach. With classpath, you MUST pick just one version of each library and hope that all the other libraries you use will be compatible with that version.
As you said, modularity is not simple when legacy libraries are around. But people in the OSGi community have been thinking about and solving these problems for ten years now, and it really is by far the best runtime solution. True, we need better tools for building new OSGi modules [bundles, in OSGi terminology] and JSR 294 will very welcome if it helps make the standard Java compiler more aware of modularity. Maven is also making great strides towards OSGi compatibility. So it would be great if Sun would work with what has already been built.
Re: OSGi solves your concerns about
Bad Experience with Modularization
Roberto Carlos Gonzalez Flores
OSGi and TDD
pablo lacerda de miranda
Modularization is a must - OSGi is a good try, but has major drawbacks ...
I've worked with OSGi for a couple of years and in different areas and posted an article about benefits, pitfalls and problems: peterrietzler.blogspot.com/2008/12/is-osgi-goin...
Re: OSGi and TDD
Modularization today - brought to KISS