BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Catching up with Nuxeo: Switching from Python to Java

Posted by Charles Humble on Aug 09, 2010 |

Founded in 2000, Nuxeo is an an open source Enterprise Content Management (ECM) specialist company. In 2006 they announced that they were changing their core technology platform from Python to Java. Four years on InfoQ caught up with Eric Barroca, CEO at Nuxeo, to find out how that conversion went, and to explore their new technology stack and position in the ECM industry. We also spent some time discussing the relative merits of dynamic and statically typed languages.

The ECM market has followed a familiar pattern. Originally it was dominated by specialist proprietary vendors such as Documentum, Interwoven, Vignette, and Stellent. Once the market was more established the big general software vendors - IBM with FileNet, Microsoft with Sharepoint and so on - entered. Open Source companies including Nuxeo and Alfresco have also become significant players. In more general terms, according to Barroca, ECM tools are increasingly becoming a commodity:

Everyone is challenged to manage an increasingly diverse range of electronically stored information, recognizing that it might be subject to legal disclosure one day. So, it's not surprising that ECM has become a commoditized technology space with Microsoft Sharepoint putting basic content management features and concepts into the hands of mainstream business users. We believe that the maturity and growth of open source and a recently ratified industry standard are also driving changes into what customers will invest their time and financial resources in.

At the same time, businesses are looking to implement new systems and refresh their technology. Products built in the 1990s are now reaching the natural end of their useful product lifecycle. Content is now created in the world of mobile, social, open and interoperable, which is the world the Nuxeo platform has been architected to meet.

That core architecture is built using Java EE as the main technology stack with OSGi providing the module system. Nuxeo has built its own runtime as a component model which is used, for example, to allow services and clients to switch the use of EJB in or out.

The product makes strong use of Java EE standards including:

"These standards are well completed by non-standardized but great components to solve pretty much any problem," Barroca told us. "Lucene, Hibernate, Eclipse, and many more lesser known components make it easier to build apps today. We do think Python is great for many use cases; it just wasn't the right platform for us for the vision and the market we were targeting."

The Nuxeo product uses around 100 libraries in total. As well as those cited by Barroca above other key ones include: jBPM as the process engine, OpenSocial and OAuth for widget inclusion and social features (Apache Shindig), and Apache Chemistry for CMIS.

The build system is based on Maven and Hudson with JUnit as the unit test framework. The product currently supports JBoss, Tomcat and Jetty, though most of the platform also runs on bare OSGi. It is a relatively quick job, in the order of a few days, to add support for a new application server according to Nuxeo.

The range and depth of Java libraries was one of the main drivers for switching to Java but Barroca also provided some other reasons

Market: the market for Java applications is huge. All companies know Java. Most already have Java applications running. A lot of companies require Java for their IT. All system integrators have extensive Java knowledge.
Community: again, huge. The Java Apache community for example is tremendous. There are big annual conventions around Java (JavaOne, ApacheCon, Devoxx, Jazoon, etc.) and hundreds of small ones.
Specifications: many technologies in the Java world are based on published specifications that have many implementations and reference implementations. This promotes clean specs, interoperable code, and pushes implementors to do their best.

We are also benefiting from the high-level of tooling available. From the VM itself, to the debugging tools, IDEs, monitoring, performance benches, etc.

Barroca argued that the re-architected platform and open source development model are opening new markets within the ECM space with Nuxeo well placed to take advantage

A new category of ISVs are using an ECM platform approach to package business knowledge into software to create and sell those content applications. The examples cut across multiple industries and functions, from construction project management and clinical trials management for biotech and life sciences, to software for control and command centers in state and local government.

Our platform’s flexibility and feature scope combined with the open source aspect of the software ease the life of content app architects and developers. The development model enables a new way of building content apps - one that is easier, cleaner, faster.

Open source ECM is also driving entirely new sources of demand within the content management market. Organizations that previously didn't deploy ECM because of high up-front costs or lack of control over their application building and customizations now have new options.

One alternative approach to allow Nuxeo to use the JVM platform would have been to use Jython, an implementation of Python written in Java that targets the JVM. However this wasn't considered by Nuxeo

We wanted to take the time to actually use our six years of experience building an ECM solution to design and build a complete platform from scratch implementing our vision. It was not only a language move, it was a platform move. We wanted to create a complete ECM platform, designed by developers for developers, to implement the vision of ECM all vendors were talking about but none were actually making a technical reality. When you try to build your software through acquisitions, which is the growth strategy for many proprietary vendors in our space, you don't have a platform, you have a software suite. In the past four years, we've been able to build a complete technical platform to create content applications.

Given that the project was a re-design/re-write of the original Nuxeo system we explored a little around how the two systems compared. 

In terms of performance, I would say we scale 10 to 100 times more, depending on the metric you take. In terms of features, it's around 4 times the scope of what we had. In terms of developer compatibility and ease-of-use, it's about the same, but with a lot more tooling and experience. We've been able to double, if not triple, the capabilities we had on the older platform.

The initial conversion work took more than a year for the 55 person company – Barroca estimated roughly 10-20 man years of developer effort. From there

..we are at around 150-200 man years if we include the contributions and the continued development. But I don't really like to measure software development in man years because there is no such thing as a constant value for a "developer man year" among developers, let alone companies. And we have a pretty serious dev team.

I was also interested in how Nuxeo's existing customers reacted to the change. Barroca told us

About 80% of the install base were fine with the change. For those organizations that didn't want to move, the power of the community stepped forward to ensure ongoing support. This is the beauty of the open source development model - the technology can survive when there is a committed group of users and developers, what we see as the 'future-proofing' advantage.

Since Nuxeo have used both Python and Java extensively we discussed the relative merits of the two languages. Generally dynamic language advocates argue that they are faster to code in, since the languages are inherently more expressive. Static language advocates counter that time saved in development is lost in testing, where lack of type information makes testing harder, and that since dynamic languages are generally slower than statically typed languages they tend to be more expensive to scale. Barroca told us:

At Nuxeo we've practiced both dynamically and statically typed languages (Python and Java) to write large-scale applications with thousands of classes and hundreds of thousands of lines of code, and we can say without a doubt that statically typed languages are better for such applications.

In a statically typed language you can perform an efficient static analysis of your code, which gives you:

  • an IDE with proper autocompletion
  • an IDE that knows all the uses of a given method or field in all of your classes, thus vastly improving searches and refactorings
  • many compile-time checks
  • tools that find patterns and bugs in your code
  • tools that extract documentation based on the actual types used in your code

In the vast majority of cases in a dynamically typed language your methods actually always receive the same type of arguments, so dynamism doesn't really bring you anything except for less typing of the argument's types - but having the types explicit in a statically typed language is rarely a hindrance and always good documentation.

When it's not the case and you have real polymorphism, modern statically typed languages give you generics to get some (not all) of the benefits of dynamically typed languages.

Another advantage sometimes seen in dynamically typed languages like Python, JavaScript, Ruby and others is that you can at runtime patch
an instance of an object (or even a class) to add behavior to it (in Python this is often called "monkey patching"). This may seem nice,
but it is horrible in terms of understanding of the code, static analysis, debugging, etc. While helpful for hacks, it's rarely a good idea in the long term.

Finally dynamically typed languages give you reflection tools to find out about the field and methods of an object whose type you do not
know, but Java has these features too since Java 5, so dynamic is not an advantage over static here either. 

Barroca is looking forward to exploring some of the recently released and upcoming technologies from Java EE 6 and Java SE 7 including the module system, JSF 2, Bean Validation, media components, and the new NIO libraries. He remains convinced that Java is the best technology for their product.

We are also very proud of the technology we've created, from our Runtime to the high-level layers we made highly modular, including the UI. It's not very exposed because we are completely positioned as an ECM player, but there are some gems in the platform that could benefit many Java apps. The extension system, modular UI based on Seam/JSF, distribution assembly engine, and a lot more components are all serious technologies that are highly useful when creating modular Java applications. When customizing with Nuxeo-based apps, you don't fork the UI to customize it, you contribute to it using plugins; it's a very open and extensible model.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Another drain on business by Dan Tines

Everyone is challenged to manage an increasingly diverse range of electronically stored information, recognizing that it might be subject to legal disclosure one day


More regulatory hoops that businesses have to jump through and hurt the economy.

Tooling by Dan Tines

In a statically typed language you can perform an efficient static analysis of your code, which gives you:

* an IDE with proper autocompletion
* an IDE that knows all the uses of a given method or field in all of your classes, thus vastly improving searches and refactorings
* many compile-time checks
* tools that find patterns and bugs in your code
* tools that extract documentation based on the actual types used in your code


Those bullet points are really the crux of the matter for large code bases. Obviously there have been large systems written in Smalltalk, Common Lisp, Python, and other dynamic languages, but IMHO it's a maintenance nightmare. It's just too bad that Java's type systems is rather primitive compared to something like Scala.

Re: Another drain on business by Robert Sullivan

Besides being irrelevant to the article, the "regulation bad, unfettered business good" argument doesn't work so well in the wake of the Wall Street bailout, and the BP fiasco.

Re: Another drain on business by Dan Tines

"regulation bad, unfettered business good"


That wasn't my argument, but there's no doubt that it is a drain on business.

Re: Another drain on business by James Watson

Actually, I doubt very much that it is universally true. For example, making it enforcing laws that make it illegal for financial institutions to piss away people's money on speculative gambling builds confidence in those institutions. Without that confidence, many people would just keep their money in their mattresses.

The thing that is frustrating about a lot of popular pro-business rhetoric is that it has no basis in actual fact or logic. Anti-business sentiment is also largely derived from ignorance but that's no excuse.

Game theory and complexity theory can help us to understand some of these unintuitive realities.

Re: Another drain on business by Dan Tines

Actually, I doubt very much that it is universally true. For example, making it enforcing laws that make it illegal for financial institutions to piss away people's money on speculative gambling builds confidence in those institutions. Without that confidence, many people would just keep their money in their mattresses.


You're confused and trying to extrapolate something out that I never claimed. I agree that without confidence you can't have effective markets, but what I did claim in the particular about setting up document management systems for regulatory purposes is a drain on business.



The thing that is frustrating about a lot of popular pro-business rhetoric is that it has no basis in actual fact or logic. Anti-business sentiment is also largely derived from ignorance but that's no excuse.


Hacker news had a post of an interesting article about the anti-business position of California over recent decades - www.newgeography.com/content/001712-the-golden-...

The bottom line is that there's way too much anti-business rhetoric going on these days. That's not surprising though considering the current administration and the leaders in congress.


Game theory and complexity theory can help us to understand some of these unintuitive realities.


Heh, I think most of these "realities" are intuitive. Many on the left would have us believe otherwise.

Well done by Sean Radford

Just a congratulations to the Nuxeo team on a great product suite and platform. If you haven't given them a test-run, then you should - and you'll find plenty of community and corporate support to help you.

Sean
www.tacola.com
www.tacolaecm.com

Next switch to Scala by Christian Helmbold

I'm looking forward to the next interview in a few years, when the have switched from Java to Scala ;-) But seriously, I think Scala would be the perfect fit for a shop with Java and Python know-how. Scala brings a lot of what one could miss in Java, when coming from Python. Even something like "static duck typing" (often called structural typing) is possible with Scala!

Re: Another drain on business by James Watson


You're confused and trying to extrapolate something out that I never claimed. I agree that without confidence you can't have effective markets, but what I did claim in the particular about setting up document management systems for regulatory purposes is a drain on business.


You wrote (in response):

"regulation bad, unfettered business good"

That wasn't my argument, but there's no doubt that it is a drain on business.


While it's clear you did not argue that unfettered regulation is good initially, you seem to confirm that is your belief here. If that's not your intention, then my response can be ignored.

If you do believe that, then if you 'intuitively' understand the implications of modern economics (increasingly influenced by game theory and complexity theory) then you should agree that regulation is not only not always bad but that in fact regulation is required in order to have effective markets.

Regulate vs. not regulate is a false dichotomy. The real question is how and how much to regulate.

Re: Another drain on business by Dan Tines


You're confused and trying to extrapolate something out that I never claimed. I agree that without confidence you can't have effective markets, but what I did claim in the particular about setting up document management systems for regulatory purposes is a drain on business.


You wrote (in response):

"regulation bad, unfettered business good"

That wasn't my argument, but there's no doubt that it is a drain on business.


While it's clear you did not argue that unfettered regulation is good initially, you seem to confirm that is your belief here. If that's not your intention, then my response can be ignored.

If you do believe that, then if you 'intuitively' understand the implications of modern economics (increasingly influenced by game theory and complexity theory) then you should agree that regulation is not only not always bad but that in fact regulation is required in order to have effective markets.

Regulate vs. not regulate is a false dichotomy. The real question is how and how much to regulate.


As I stated, you have to have confidence in institutions in order for free markets to work. So yes, of course I agree it's a false dichotomy. It's a matter of degree.

The thing that is frustrating about a lot of popular pro-business rhetoric is that it has no basis in actual fact or logic.



Putting aside rhetoric, who isn't pro-business besides hardcore leftists? California? Did you read the article I linked to? www.newgeography.com/content/001712-the-golden-...

It's fun to debate this stuff, but we're way off-topic at this point.

Awesome ECM Tool by ravi chandran

We have been using ecm tool for few months and found to be promising with whole lot of useful features. However there are couple of issues like,
1. When integrated with other applications via CAS SSO, document links always lands in dashboard page after successful login instead of taking the user to the specific document page. Looks like the web engine has hardcoded url handling mechanism.
2. Group permission do not work.
3. UI Layout changes vanish after restart.
And Download tracker - could be an useful feature, but missing.

Great Work Nuxeo by Antonio de las Nieves

Just like Sean said, we from Yerbabuena are very happy with the product and community Nuxeo has developed.
In fact, Nuxeo has become a role model, a pure open source player with great business success (although we want to differentiate with modules on top of these services that Nuxeo already brings).

So yeah, Nuxeo is becoming really popular as a product and company :-)

nice article by Leandro Coutinho

It shows some points why Java is widely used.

Re: Tooling by sasamat sasamat

In the fast moving world of languages and technology I think this may now be a moot point.

PyDev for Eclipse and JetBtrains new PyCharm IDE, to name just two, both provide the kind of support (code completion, refactoring, debugging etc) for Python that olde worlde IDEs have long offered C++, Java and C# developers. And even Visual Studio now does the same for IronPython

The bigger question is whether a tool-based approach to development (the entire point of both Microsoft and IBM's respective ecosystems) or a language-based one (TDD, Dynamic languages and very rapid development methodolgies) is superior.

Many if not all of the new large scale Web 2 projects such as Facebook, Twitter, Digg, Delicious, Friendfeed and Reddit are language-based. Maybe the Web 2 space is special and the hitherto accepted mantras of development don't apply there, even if they still do everywhere else: or maybe, as William Gibson tells us: "the future is already here, just unevenly distributed".

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

14 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT