BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Catching up with Nuxeo: Switching from Python to Java

Catching up with Nuxeo: Switching from Python to Java

Founded in 2000, Nuxeo is an an open source Enterprise Content Management (ECM) specialist company. In 2006 they announced that they were changing their core technology platform from Python to Java. Four years on InfoQ caught up with Eric Barroca, CEO at Nuxeo, to find out how that conversion went, and to explore their new technology stack and position in the ECM industry. We also spent some time discussing the relative merits of dynamic and statically typed languages.

The ECM market has followed a familiar pattern. Originally it was dominated by specialist proprietary vendors such as Documentum, Interwoven, Vignette, and Stellent. Once the market was more established the big general software vendors - IBM with FileNet, Microsoft with Sharepoint and so on - entered. Open Source companies including Nuxeo and Alfresco have also become significant players. In more general terms, according to Barroca, ECM tools are increasingly becoming a commodity:

Everyone is challenged to manage an increasingly diverse range of electronically stored information, recognizing that it might be subject to legal disclosure one day. So, it's not surprising that ECM has become a commoditized technology space with Microsoft Sharepoint putting basic content management features and concepts into the hands of mainstream business users. We believe that the maturity and growth of open source and a recently ratified industry standard are also driving changes into what customers will invest their time and financial resources in.

At the same time, businesses are looking to implement new systems and refresh their technology. Products built in the 1990s are now reaching the natural end of their useful product lifecycle. Content is now created in the world of mobile, social, open and interoperable, which is the world the Nuxeo platform has been architected to meet.

That core architecture is built using Java EE as the main technology stack with OSGi providing the module system. Nuxeo has built its own runtime as a component model which is used, for example, to allow services and clients to switch the use of EJB in or out.

The product makes strong use of Java EE standards including:

"These standards are well completed by non-standardized but great components to solve pretty much any problem," Barroca told us. "Lucene, Hibernate, Eclipse, and many more lesser known components make it easier to build apps today. We do think Python is great for many use cases; it just wasn't the right platform for us for the vision and the market we were targeting."

The Nuxeo product uses around 100 libraries in total. As well as those cited by Barroca above other key ones include: jBPM as the process engine, OpenSocial and OAuth for widget inclusion and social features (Apache Shindig), and Apache Chemistry for CMIS.

The build system is based on Maven and Hudson with JUnit as the unit test framework. The product currently supports JBoss, Tomcat and Jetty, though most of the platform also runs on bare OSGi. It is a relatively quick job, in the order of a few days, to add support for a new application server according to Nuxeo.

The range and depth of Java libraries was one of the main drivers for switching to Java but Barroca also provided some other reasons

Market: the market for Java applications is huge. All companies know Java. Most already have Java applications running. A lot of companies require Java for their IT. All system integrators have extensive Java knowledge.
Community: again, huge. The Java Apache community for example is tremendous. There are big annual conventions around Java (JavaOne, ApacheCon, Devoxx, Jazoon, etc.) and hundreds of small ones.
Specifications: many technologies in the Java world are based on published specifications that have many implementations and reference implementations. This promotes clean specs, interoperable code, and pushes implementors to do their best.

We are also benefiting from the high-level of tooling available. From the VM itself, to the debugging tools, IDEs, monitoring, performance benches, etc.

Barroca argued that the re-architected platform and open source development model are opening new markets within the ECM space with Nuxeo well placed to take advantage

A new category of ISVs are using an ECM platform approach to package business knowledge into software to create and sell those content applications. The examples cut across multiple industries and functions, from construction project management and clinical trials management for biotech and life sciences, to software for control and command centers in state and local government.

Our platform’s flexibility and feature scope combined with the open source aspect of the software ease the life of content app architects and developers. The development model enables a new way of building content apps - one that is easier, cleaner, faster.

Open source ECM is also driving entirely new sources of demand within the content management market. Organizations that previously didn't deploy ECM because of high up-front costs or lack of control over their application building and customizations now have new options.

One alternative approach to allow Nuxeo to use the JVM platform would have been to use Jython, an implementation of Python written in Java that targets the JVM. However this wasn't considered by Nuxeo

We wanted to take the time to actually use our six years of experience building an ECM solution to design and build a complete platform from scratch implementing our vision. It was not only a language move, it was a platform move. We wanted to create a complete ECM platform, designed by developers for developers, to implement the vision of ECM all vendors were talking about but none were actually making a technical reality. When you try to build your software through acquisitions, which is the growth strategy for many proprietary vendors in our space, you don't have a platform, you have a software suite. In the past four years, we've been able to build a complete technical platform to create content applications.

Given that the project was a re-design/re-write of the original Nuxeo system we explored a little around how the two systems compared. 

In terms of performance, I would say we scale 10 to 100 times more, depending on the metric you take. In terms of features, it's around 4 times the scope of what we had. In terms of developer compatibility and ease-of-use, it's about the same, but with a lot more tooling and experience. We've been able to double, if not triple, the capabilities we had on the older platform.

The initial conversion work took more than a year for the 55 person company – Barroca estimated roughly 10-20 man years of developer effort. From there

..we are at around 150-200 man years if we include the contributions and the continued development. But I don't really like to measure software development in man years because there is no such thing as a constant value for a "developer man year" among developers, let alone companies. And we have a pretty serious dev team.

I was also interested in how Nuxeo's existing customers reacted to the change. Barroca told us

About 80% of the install base were fine with the change. For those organizations that didn't want to move, the power of the community stepped forward to ensure ongoing support. This is the beauty of the open source development model - the technology can survive when there is a committed group of users and developers, what we see as the 'future-proofing' advantage.

Since Nuxeo have used both Python and Java extensively we discussed the relative merits of the two languages. Generally dynamic language advocates argue that they are faster to code in, since the languages are inherently more expressive. Static language advocates counter that time saved in development is lost in testing, where lack of type information makes testing harder, and that since dynamic languages are generally slower than statically typed languages they tend to be more expensive to scale. Barroca told us:

At Nuxeo we've practiced both dynamically and statically typed languages (Python and Java) to write large-scale applications with thousands of classes and hundreds of thousands of lines of code, and we can say without a doubt that statically typed languages are better for such applications.

In a statically typed language you can perform an efficient static analysis of your code, which gives you:

  • an IDE with proper autocompletion
  • an IDE that knows all the uses of a given method or field in all of your classes, thus vastly improving searches and refactorings
  • many compile-time checks
  • tools that find patterns and bugs in your code
  • tools that extract documentation based on the actual types used in your code

In the vast majority of cases in a dynamically typed language your methods actually always receive the same type of arguments, so dynamism doesn't really bring you anything except for less typing of the argument's types - but having the types explicit in a statically typed language is rarely a hindrance and always good documentation.

When it's not the case and you have real polymorphism, modern statically typed languages give you generics to get some (not all) of the benefits of dynamically typed languages.

Another advantage sometimes seen in dynamically typed languages like Python, JavaScript, Ruby and others is that you can at runtime patch
an instance of an object (or even a class) to add behavior to it (in Python this is often called "monkey patching"). This may seem nice,
but it is horrible in terms of understanding of the code, static analysis, debugging, etc. While helpful for hacks, it's rarely a good idea in the long term.

Finally dynamically typed languages give you reflection tools to find out about the field and methods of an object whose type you do not
know, but Java has these features too since Java 5, so dynamic is not an advantage over static here either. 

Barroca is looking forward to exploring some of the recently released and upcoming technologies from Java EE 6 and Java SE 7 including the module system, JSF 2, Bean Validation, media components, and the new NIO libraries. He remains convinced that Java is the best technology for their product.

We are also very proud of the technology we've created, from our Runtime to the high-level layers we made highly modular, including the UI. It's not very exposed because we are completely positioned as an ECM player, but there are some gems in the platform that could benefit many Java apps. The extension system, modular UI based on Seam/JSF, distribution assembly engine, and a lot more components are all serious technologies that are highly useful when creating modular Java applications. When customizing with Nuxeo-based apps, you don't fork the UI to customize it, you contribute to it using plugins; it's a very open and extensible model.

Rate this Article

Adoption
Style

BT