BT

Guardian.co.uk Switching from Java to Scala

Posted by Charles Humble on Apr 04, 2011 |

The team behind guardian.co.uk which, according to its editor, has the second highest readership of any on-line news site after the New York Times, is gradually switching from Java to Scala, starting with the Content API, which provides a mechanism for selecting and collecting Guardian content.

The guardian.co.uk website comprises about 100,000 lines of code. It uses a fairly typical open-source Java stack of Spring, Apache Velocity and Hibernate with Oracle providing the database. Like the website, the Content API was initially being developed in Java, but the team decided to switch to another JVM-based language, Scala, in its place. Web Platform Development Team Lead Graham Tackley told us

We've been a primarily Java development shop for a number of years now, and this has largely served us well. However, as a news website we want to be able to respond to events very quickly. The core Java platform that delivers www.guardian.co.uk has a full release every two weeks. Compared with many enterprise Java applications, this is excellent. Compared with other websites, it's very poor.

So we've been looking for a while at tools, approaches and languages that enable us to deliver functionality faster. This includes using lighter weight Java frameworks like Google Guice, radically different approaches to Java development like the Play framework, and using other platforms such as Python with Django. As part of this exercise we'd been playing with Scala for a while, but unlike the others we hadn't yet used it for any production code.

We were very keen that the first non-beta release of the Content API (API, Open Platform) should be the first iteration of an ongoing evolving API, which could quickly evolve as we discovered all the interesting use cases that we hadn't initially thought of. To do this safely without breaking API clients, we needed a comprehensive set of integration tests. After some experimentation of writing these in Java, we decided instead to write just the integration tests in Scala, for three main reasons:

  1. The flexibility of the testing DSL provided by ScalaTest.
  2. We wanted to be excited about writing the integration tests, rather than them being a chore.
  3. Using Scala just for the tests meant we got to use it in anger without impacting production code directly.

After about four weeks of writing just the tests in Scala, we got fed up of having to write the main code in Java, and decided to convert the whole lot to Scala.

InfoQ: In general terms, how did you go about the migration? Did you re-write all the Java code in Scala for instance, or did you combine the two for a while?

The beta version of the Content API was based on a proprietary search engine. The current API uses the excellent Apache Solr (a talk on guardian.co.uk's use of Solr can be found here), and is also quite different in style to the beta one - the beta did a great job of showing us what we didn't want the API to look like. Therefore, before Scala came into the picture, we'd decided to re-implement the API rather than reuse the beta codebase.

We'd spent around six weeks with three people implementing in Java before we introduced Scala, so there wasn't a massive codebase to migrate. However, we weren't prepared to stop the project for a couple of weeks while we converted to Scala, so we migrated the existing integration tests gradually. As we'd used Maven as a build tool, introducing Scala was a matter of following the instructions to use the maven-scala-plugin to build mixed Java/Scala projects. This allows Java and Scala code to co-exist in the same project, and bi-directionally depend on each other. So we could convert on a class-by-class basis from Java to Scala, which worked far better than we ever imagined: it really did just work.

We took the same approach when converting the main code: over a number of weeks, as we touched a bit of code, we converted it. We then had a couple of days mop up at the end.

InfoQ: What are the libraries/frameworks that you have used for development?

Since we were using a language new to us all, we decided to limit the amount of new stuff that we needed to learn. We chose to stick with plain servlets wired with Google Guice, which is how we build our Java apps now. We use SolrJ, the Java Solr library, to talk to Apache Solr, Joda-Time for date time manipulation and Mockito for unit test mocking (this worked fine with Scala code too).

Sometimes we consciously chose to stick with what we knew to ensure timely delivery: the XML formatted endpoints are generated not using Scala's excellent XML support, but using javax.xml.stream.XMLStreamWriter just as we would in Java code. We'd already written this before moving to Scala; it worked, it was readable, so we left it. However, we did switch to use the excellent JSON library from Lift - lift-json - to generate the JSON formatted endpoints as the code was far clearer than with the Java JSON library we were using.

InfoQ: What IDEs do you use for development? What is Scala IDE support like?

We use Jetbrains IntelliJ IDEA 10, some of us use the community edition and some use the ultimate edition. The Scala plugin is pretty good but not perfect. Code completion, find usages, and similar navigation nearly always works just fine. It's not as good as Java at red highlighting code that isn't valid, and we had some problems with it finding ScalaTest test methods, but other than that we were in our familiar environment working as we always had, just in a much more powerful language.

InfoQ: I'm assuming the majority of the developers on the project were Java programmers? How easy did the developers on the project find learning Scala?

Yes, all of us were quite experienced Java programmers. The initial team of four had huge fun learning Scala: often one of us would come in raving about this new Scala feature we'd discovered and sharing it with the rest of the team. We had a buzz that had long been missing in our Java development. Because we were all learning together, this worked really well. In the first couple of weeks, though, there were occasions when we'd be battling to implement something in a good Scala way and couldn't figure it out. Knowing you could just churn out the Java code made this particularly frustrating. There were a few days where we went home in frustration saying, "We're going back to Java tomorrow". Each time, a fresh look in the morning was all it needed to move on.

Since then, we've had around ten other Java devs move to pick up Scala. As always, people learn at different speeds and in different ways, but all have come through that and nearly all now get frustrated when they have to write Java code.

One of the things we compare learning Scala against is moving to a different platform like Python/Django or Ruby on Rails. With Scala, at least 75% of what you're working with is the same as in Java. You can use the same libraries and IDE, the way you package jars and wars is the same, your runtime environment and runtime characteristics are the same. A good Java developer can learn to write Java-style code in Scala in a day, then they learn the power of closures and implicit conversions and very soon they're more productive than they were in Java.

InfoQ: One of the common criticisms of Scala as a language boils down to it being too complex. A lot of the time I think this is really about readability - the idea being that it is easier to pick up someone else's code if it is written in a more rigid language like Java. Do you think the criticism is fair? How do you counter it?

I agree, readability is by far the most important characteristic of a codebase. I don't care whether code is imperative or functional, or is idiomatic Scala or Java-without-semicolons, I only care whether it's readable. When we were learning new Scala features, we chose whether to use them based on whether the intent of the resulting code was more obvious. In one example, we tried using the Scala Either class to eliminate a few If statements: the team collectively concluded that the If statements were more readable, so we dropped the use of Either in that case.

It's true that due to the rigidity of Java individual lines of code are always easily understood. But that's rarely the problem in understanding any non-trivial codebase: I don't want to understand the detail, I want to understand the intent. Good class design and OO techniques help address this in Java, but I still often find when reading Java code that I cannot see the wood for the trees. In Scala I have the power to express the intent in a way I rarely can in Java.

For example, the Content API needs to decide whether to return results in XML, JSON or redirect to the HTML explorer. We support a format=query string, adding a .xml or .json extension, and specification in an http Accept header. Here's the code that does that, which I think is a good example of how Scala's power aids expression of intent (it's just chaining calls to Scala's Option class):

def negotiateFormatParameter =getParam("format").
orElse(getExtension).
orElse(getExtensionFromAcceptHeader).
getOrElse("html")

There's also a good case that readability is at least partially a function of how much code you have to read. My Java code tends to end up with lots of lines of code unrelated to the problem I am trying to solve, whether this be null checks, getters & setters, constructors for dependency injection or manipulating collections. All of these problems have much more concise expressions in Scala. Of course, much of this can be autogenerated by your IDE, but when reading your codebase I still have to read your constructor and your getters and setters to see if you've customised them.

A classic example is to compare a simple class in Java and Scala:

Java:

public class WelcomeClass {
    private String name;
    
    public WelcomeClass(String name) {
        this.name = name;
    }
    
    public String sayHello() {
        return "Hello " + name;
    }
}

and in Scala:

class WelcomeClass(name: String) {
    def sayHello = "Hello " + name
}

The Java version has to tell me that name is a String three times and it mentions "name" five times. The Scala version mentions "name" only twice, and only says it's a string once. Of course this is a trivial example, but it's symptomatic of what we've found in Scala: less boilerplate code and less needless repetition means fewer trees and more wood, i.e. it's easier to see the intent, not just the detail.

I tend to find that when reading an individual line of code in Scala it sometimes takes a little longer to understand how it's working, but this is more than made up for by the drastic reduction in the number of lines of code.

InfoQ: Sticking with complexity for a moment, do certain aspects of the language - I'm thinking here mainly about things like symbolic names and implicits - cause problems in real-world usage?

Actually implicit conversions really helped us. As I mentioned, we use the SolrJ Java library to talk to Solr, which is an excellent library but, as a Java library, it loves returning nulls. To avoid null checks littering our codebase, we implicitly convert key classes to ones which have more Scala-like methods. So far from causing problems in real-world usage, it actively solves them. In addition, the IntelliJ Scala plugin now understands implicits in nearly all cases, so if you're not sure what's happening control+click will take you to what's actually being called.

We've tended to steer clear of heavily symbolic libraries and using symbols for method names, but I think this is an important feature of the language, which like any feature is possible to over-use. Sometimes it makes sense: our method to extract query strings from the http request is called "?", which reads really well in the code. Much more than in Java, the great power of Scala brings great responsibility to focus on whether you're making the intent of your code easier to read. Just because that power can be misused doesn't mean I don't want the power.

InfoQ: Another concern for using Scala in enterprise applications is that each new version seems to break backwards compatibility - so programs compiled against Scala 2.8 are not compatible with binaries compiled earlier and so on. How significant do you think this is? How do you manage incompatibilities within Scala projects?

We started off writing to Scala 2.7.7, and migrated to both 2.8.0 and 2.8.1 soon after they were released. It was pretty painless; the 2.8.0 migration took less than a day (and that was only because I wanted to eliminate the deprecation warnings) and 2.8.1 was drop in. All of the libraries we were using already published versions for multiple Scala versions so that all just worked.

The only time it was painful was when, on my personal projects, I was using 2.8 pre-releases. But then, it was my choice to use clearly labeled pre-release code.

We tend now to use simple-build-tool for Scala projects instead of Maven, which makes it easy to release internal libraries for multiple Scala versions.

I'd much rather face some incompatibility when I choose to upgrade than face the situation in Java where some things are never changed and can never be changed. The commonly used HttpServletRequest.getHeaders still returns a java.util.Enumeration, which was effectively deprecated in Java 1.2.

InfoQ: What is the situation with regards finding programmers to continue supporting the Scala code - are good Scala programmers as easy to find as good Java programmers for instance?

We primarily recruit good software developers, rather than looking specifically for Scala developers. We tend to find that good web software developers are polyglot in outlook and have often at least played with Groovy, Scala, Clojure, Ruby or Python. Such people have usually relished the opportunity to work with Scala.

InfoQ: About how much Scala production code is guardian.co.uk running today?

The core codebase behind guardian.co.uk - the one on a two week release cycle - as of two weeks ago has one single Scala class within it. We've maintained a self-imposed no-Scala rule on that codebase until recently, just to be sure we're fully ready as a team to embrace Scala (and to stop me rewriting it).

However, many different parts of the site are now driven by Scala microapps including Search (which is written in Scala with Lift), Most Viewed, Punctuated Equilibrium Mystery Bird and the related content component on every article page.

Furthermore, our new identity platform, under development but with its first iteration in production, is written in Scala.

InfoQ: Will you be using more Scala code in the future?

We've found that Scala has enabled us to deliver things faster with less code. It's reinvigorated the team. We'll continue to use the right tool for the job whether that be Scala, Python, .NET, PHP or Bash.

In the last six months, all of the new JVM-based projects have used Scala and none have selected Java. I can't see us starting any new project in Java now, especially given the disappointing feature set and timeframe of Java 7.

For developers interesting in learning the language Tackley recommended "Programming in Scala" by Martin Odersky et al. The 2nd edition covers Scala 2.8. He also told us

We found using the Scala REPL (command line) was a good way to experiment with writing Scala code. And, despite what others may say, don't fear just using Scala as a better Java-without-semicolons in the first few days, weeks or months. You'll be missing out if you end your journey here, but it's not a bad phase to go through. Keep learning and embrace features of the language incrementally. The ability to do this is what I think makes Scala unique as a next step for Java developers.

Besides the technical aspects, guardian.co.uk's Content API, and the broader Open Platform suite of services of which it forms a part, is interesting from a business point of view, since it represents a very different approach from that favoured by a growing number of quality newspapers in the UK and elsewhere, ie to place their content behind paywalls. To date The Financial Times, and News International's Times and The Sunday Times have all decided to go down this path, and more recently The New York Times has started to roll-out a paid content model. Veteran BBC journalist John Humphrys argued in The Sun, a tabloid News International paper, that "Good journalism has to be paid for, just as we have to pay for the plumber who fixes a leak, or it will not survive". Tackley however, takes a different view

We firmly believe that the future of digital publishing is to engage and integrate with the rest of the web, not to retreat from it.

The Content API is able to extend the reach and brand of the Guardian into significant areas that we would otherwise not be able to reach. This is helped by third parties and partners who use the API, as they can invest in an area and use the relevant Guardian content.

We have a number of different tiers of access to the Content API: without registration, you can access content metadata but not the actual content, with a limited QPS (queries per second) rate. After registration, you can access the article bodies, which include embedded adverts, again with a limited QPS. At the top tier, you become a partner of the Guardian and we agree an appropriate commercial agreement.

A good example partner is WhatCouldICook.com, built by a small independent developer. This site is a good example of tight integration with Guardian content: it uses the API to extract and parse all the recipes we publish and present them in a great way for people wanting to cook. Furthermore, our readers benefit because we include functionality from whatcouldicook.com on our site, see for example the recipe search on the right hand side of the Guardian website.

Our wordpress plugin, built on top of the API, enables any wordpress user to include related Guardian content on their blog.

This level of innovation in specific areas is something that simply would not happen without the Content API. Furthermore, we extensively use the Content API for Guardian-driven projects to aid our rate of innovation: site features like search and zeitgeist, our mobile site and our iPhone app are all driven by the Content API.

 

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Java is on the way out by Garry T

Java seems to be on the way out in so many Enterprises. Java developers are a dime a dozen where I live.

Re: Java is on the way out by Neil Bartlett

>Java seems to be on the way out in so many Enterprises.
>Java developers are a dime a dozen where I live.

At the very least this is a non sequitur, and at worst two directly contradictory statements...

Re: Java is on the way out by Travis Calder

It's either a non sequitur, or he means to say that there are more Java devs than there are Java jobs. High supply, low demand.

This is my previous feed item before this article. by Sake .

Re: Java is on the way out by Mark N

This post is not about enterprise development. It is about a very specific use case. It works good for this case. I don't see types of systems I build going 100% to Scala (or anything like it)- I could be wrong, though. We are using groovy for scripting and probably, eventually, to replace configuration.

Having used things in the past that resemble things like Scala (etc), I just don't see them as useful for the mainstream developer. You need a good team of highly responsible developers. And talented ones too (at least enough to help the not so talented ones). Most developers are not this. Sure, you can teach them to code in [Scala|Groovy|etc], but can you teach them to build maintainable solutions? My experience says no. Sadly.

I don't see many postings for Scala, Groovy, JRuby, etc. So I doubt Java is on its way out. I am not saying they are bad. I am not saying no one is using them.

Re: Java is on the way out by Dan Tines


Having used things in the past that resemble things like Scala (etc), I just don't see them as useful for the mainstream developer. You need a good team of highly responsible developers. And talented ones too (at least enough to help the not so talented ones). Most developers are not this. Sure, you can teach them to code in [Scala|Groovy|etc], but can you teach them to build maintainable solutions? My experience says no. Sadly.


Well, then why did Java become mainstream when there are much simpler languages out there to be had? C# and VB.NET have turned out to be more complex languages than Java, but I really don't see this attitude that I constantly see in the Java community.

Anyway, I don't agree with your premise, which seems to be that Scala is too complex for your average developer. But maybe the perception makes it a reality.

Re: Java is on the way out by Sasha O

@GarryT:

>>> Java developers are a dime a dozen where I live. <<<

where is it? I want to know where to turn when I need Java engineers next time.

Re: Java is on the way out by Chris Webster

It might indeed be that Scala etc are not ready for the enterprise, or it might just be early days for Scala. My view is that it might also be simply the volume of legacy JEE code and the easy availability of Java developers that keeps JEE in the mainstream: Java today is roughly where COBOL was in the early 1990s. But I get the impression there is increasing fragmentation within and beyond the JVM, and many organisations are losing patience with the time-consuming complexities (=costs) of JEE development and are looking for alternatives, especially when budgets are tight. For example, there are around 232 Python jobs posted currently in the UK on www.jobserve.com, and many of these are working on polyglot systems e.g. with Java/Perl/C etc on financial systems and other non-trivial applications. Of course, Python has been around for many years, but it's an indication that people are starting to think outside the pure Java box - and combining the power of the JVM and JEE libraries with the productivity of the new generation of JVM-based languages may be a promising alternative.

Scala is compelling and will win significant share over time by Faisal Waris

I don't believe that Java is going away anytime soon however the greater expressiveness of Scala will win Java developers over in time. Not everyone will make the jump, though.

It takes a good few months of substantial coding to grok the programming idioms of a significantly different language. Not everyone has time/patience to see it through. Initially, one has to undertake the effort almost on faith.

Having recently learnt F# (roughly a Scala-equivalent on .Net) I can say that it was a significant investment but well worth it in the end. Compared to C# (say), the amount of code I write is much less; certain programming tasks such as asynchronous programming, language oriented programming, etc. are significantly easier; and it is easier to see the design intent in code.

Re: Java is on the way out by Garry T

I didn't mean to offend anyone or start a flame war. I'm in western Canada and there's a move to .Net and Ruby. Head hunters all say there is much more demand for .Net and rates have fallen for Java contractors. The Java user group also stopped meeting because of no interest. There are lots of Java developers with years of experience.

Re: Java is on the way out by Garry T

Yes, high supply and low demand.

yes, Java is out by Hao Deng

There are more and more .NET jobs in the market. Large companies prefer to choose .NET
Most of the Java positions are offered by startup companies.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

12 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT