BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Yammer Moving from Scala to Java

Yammer Moving from Scala to Java

This item in japanese

Lire ce contenu en français

Bookmarks

An e-mail, sent from Yammer employee Coda Hale to Scala's commercial management at Typesafe, ended up being leaked via YCombinator and a gist at GitHub. The e-mail confirms that Yammer is moving its basic infrastructure stack from Scala back to Java, owing to issues with complexity and performance.

Yammer PR Shelley Risk confirmed to InfoQ that the e-mail represented the personal opinions of Coda Hale, rather than an official statement from Yammer itself; a follow up from the original author has been published at http://codahale.com/the-rest-of-the-story/. In it, Coda clarified that the message was a result of a request for feedback from Donald Fischer (CEO of Typesafe) following an earlier tweet indicating the move.

Update: Code has published Yammer's official position on the subject; which confirms the above points. It also points out that any language has flaws (not just Scala) and that the e-mail was an attempt at offering advice for how to improve Scala's performance and other concerns. Finally, it concluded that when rolling out any high performance project (for which Scala is their production environment) there are rough edges which need to be filed down; the e-mail was an attempt at helping Scala improve.

Although the e-mail was not meant to be publicly shared, Coda put it on GitHub via a Gist (since deleted) to get feedback from other friends; however, the content was then subsequently shared and then reported more widely.

Back in August 2010, Coda said on the Yammer Engineering blog that they were moving to Scala for their realtime future. The goal was to continue running on the JVM (for performance reasons) and that the conversion had resulted in approximately a 50% code reduction:

Our initial prototype of Artie was in Java, but as a weekend experiment I tried reimplementing it in Scala 2.8. After a day, I had dropped about half the lines of code and added several tricky features. I was sold. It might be easier to hire Java developers, but a Scala team will be able to get a lot more done

Fast forward a year and a quarter later, and the decision is being reversed:

Right now at Yammer we're moving our basic infrastructure stack over to Java, and keeping Scala support around in the form of façades and legacy libraries. It's not a hurried process and we're just starting out on it, but it's been a long time coming. The essence of it is that the friction and complexity that comes with using Scala instead of Java isn't offset by enough productivity benefit or reduction of maintenance burden for it to make sense as our default language. We'll still have Scala in production, probably in perpetuity, but going forward our main development target will be Java.

Stephen Colebourne, who recently posted the thread on Is Scala the new EJB2? has annotated the mail with a number of bullet points, summarising the issues involved:

  • Scala, as a language, has some profoundly interesting ideas in it. But it's also a very complex language.
  • In addition to the concepts and specific implementations that Scala introduces, there is also a cultural layer of what it means to write idiomatic Scala … at some point a best practice emerged: ignore the community entirely.
  • In hindsight, I definitely underestimated both the difficulty and importance of learning (and teaching) Scala. Because it's effectively impossible to hire people with prior Scala experience, this matters much more than it might otherwise.
  • Adding to the unease in development were issues with the build toolchain. … This emphasis on SBT being the one true way has meant the marginalization of Maven and Ant -- the two main build tools in the Java ecosystem.
  • Each major Scala release being incompatible with the previous one biases Scala developers towards newer libraries and promotes wheel-reinventing in the general ecosystem.
  • Via profiling and examining the bytecode we managed to get a 100x improvement by adopting some simple rules:
    • Don't ever use a for-loop
    • Don't ever use scala.collection.mutable
    • Don't ever use scala.collection.immutable
    • Always use private[this]
    • Avoid closures
  • I broached this issue [moving back to Java] with the team, demo'd the two codebases, and was actually surprised by the rather immediate consensus on switching. There's definitely aspects of Scala we'll miss, but it's not enough to keep us around.

Some of these issues are likely to be circumstantial (for example, the ease of hiring a developer with existing experience increases the longer a language is popular), there are some which can be empirically tested. For example, one of the pieces of advice is to avoid for loops. This can be tested with the following piece of code:

scala>
  var start = System.currentTimeMillis();
  var total = 0;for(i <- 0 until 100000) { total += i };
  var end = System.currentTimeMillis();
  println(end-start);
  println(total);
114
scala>
scala< 
  var start = System.currentTimeMillis();
  var total = 0;var i=0;while(i < 100000) { i=i+1;total += i };
  var end = System.currentTimeMillis();
  println(end-start);
  println(total);
8

Using the for loop with an 'until' pattern here (which many Scala programmers would consider idiomatic) can be seen to be significantly slower than the corresponding while loop, even if the code is less readable. The corresponding Java implementation of the same loop shows a result of 2ms for both the for and while loops.

Another test we can perform is the performance of the mutable map by loading in a data set consisting of Integer objects. (This can be compared in Java and Scala and the cost of boxing should be equivalent.):

scala>
  val m = new scala.collection.mutable.HashMap[Int,Int]; 
  var i = 0;
  var start = System.currentTimeMillis();
  while(i<100000) { i=i+1;m.put(i,i);};
  var end = System.currentTimeMillis();
  println(end-start);
  println(m.size)
101
scala>
  val m = new java.util.HashMap[Int,Int]; 
  var i = 0;
  var start = System.currentTimeMillis();
  while(i<100000) { i=i+1;m.put(i,i);};
  var end = System.currentTimeMillis();
  println(end-start);
  println(m.size)
28
scala>
  val m = new java.util.concurrent.ConcurrentHashMap[Int,Int]; 
  var i = 0;
  var start = System.currentTimeMillis();
  while(i<100000) { i=i+1;m.put(i,i);};
  var end = System.currentTimeMillis();
  println(end-start);
  println(m.size)
55

Compared against the vanilla Java code, performance is identical when comparing the java.util.HashMap, and the Java implementation with java.util.concurrent.ConcurrentHashMap is twice as fast as its Scala counterpart. Both of the Java collection classes outperform the Scala counterpart, however. (Timings taken on OSX JVM 1.6.0_29 and Scala 2.9.1, the latest at the time of writing.)

Unfortunately, the Scala collections are pervasive in the Scala library APIs, and as such, they may be promoted from the Java object types to the Scala object types through implicits in the code. According to the migration mail, this resulted in significant re-writing for performance reasons.

Performance of closures (lambdas) may be improved if the Scala compiler generates code with invokedynamic; something that might happen in future versions of the Scala compiler. In addition, in JDK 8 (which will bring both native lambdas and method handles to Java ) has a number of performance advantages which a future Scala version may be able to take advantage of.

Finally, there is increasing pressure for Scala to fix its backward compatibility between releases (rather than just in the minor releases between 2.9.2 and 2.9.3). There has been no official announcement from Typesafe regarding the future roadmap on Scala, or when a stable compiled binary format will permit code to be backwardly (or forwardly) compatible between releases. Having a backward compatible format would enable for more stable libraries to be released and build a community repository, which would help anyone interested in building upon Scala for the future.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Measurings...

    by Daniel Sobral,

  • Without proper profiling tools you are lost - regardless

    by Faisal Waris,

  • Please don't use REPL for benchmarking

    by Andriy Plokhotnyuk,

  • I won't call this FUD, but something's fishy

    by Ed Lover,

  • Int-overrun, immature code

    by Stefan Wagner,

  • Yammer responds.

    by Richard Hightower,

  • So benchmarks lie

    by Joao Pedrosa,

  • Response by Typesafe

    by Joost de Vries,

  • On Clojure

    by Alex Miller,

    • Macro vs Mircro Optimization

      by Thomas Santana,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      J. Suereth posts some replies to the original post. It includes some very interesting counter arguments: Macro vs. Micro Optimization.

    • Measurings...

      by Daniel Sobral,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      The latest Scala version is 2.9.1. I have no idea where "2.3.9" came from -- I don't think such a version has even existed.

      At any rate, I could not reproduce your results. Scala's mutable.HashMap, either 2.9.1 or trunk, performs pretty much on par with Java when executing the same code snippets you provided.

    • Re: Measurings...

      by Robin St,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Just tried with 2.9.1 here (using two scripts, executed via "scala script.scala"), multiple runs with times in ms:

      java.util.HashMap:                47, 44, 54, 51, 43
      scala.collection.mutable.HashMap: 81, 81, 82, 81, 84


      Scala version:

      Welcome to Scala version 2.9.1.final (OpenJDK 64-Bit Server VM, Java 1.6.0_23).

    • Re: Measurings...

      by Joost de Vries,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Are the measurements the same if you use the Scala compiler instead of the Scala REPL interpreter?

    • Re: Measurings...

      by Robin St,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      They are pretty much the same, both are about 4-6 ms faster.

    • Re: Measurings...

      by Daniel Sobral,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Ah, ok. I have now produced a "proper" micro-benchmark and I get the same proportion.

    • Without proper profiling tools you are lost - regardless

      by Faisal Waris,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Note that in the Google benchmark Scala was faster than Java but - here is the key - both implmentations were optimized by respective experts.

      I can attest to the importance of profiling. I actually ported the Google benchmark to F#. There was a big difference in performance in the initial F# implementation and the final one. I used the Visual Studio 2010 profiler (an excellent tool) to quickly zero-in on the "hotspots" and optimize the code.

      The final version of F# turned out to be even faster than the original C++ implemenation as independently verified by someone.

      Note that the final C++ implementation was fastest of all and is more directly comparable to final Scala and F# implementations. The orginal C++ implementation did too many allocations which were removed in the final version and caching was used instead. (you have to dig through the links to find all of that)

      Bottom line: Write your code idiomatically and then profile and tune. The code does not have to be super optimized in all locations. Good profiling tools are a must.

    • Re: Without proper profiling tools you are lost - regardless

      by Michael Campbell,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      > Bottom line: Write your code idiomatically and then profile and tune. The code does not have to be super optimized in all locations. Good profiling tools are a must.

      1000 times, this. For what I'm guessing to be a large portion of us, the absolute fastest is not what we need. Our requirements are unlike Yammer's. For a lot of us, it doesn't matter if something is "slower", if we get it out the door 3 to 6 months quicker.

      But there are those of use for whom it DOES matter. And for these cases, you absolutely have got to profile. Performance isn't a problem until performance is a problem.

    • Please don't use REPL for benchmarking

      by Andriy Plokhotnyuk,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Here is simple script and it's output on Core 2, 3GHz (Scala 2.9.1.final, JDK 1.6.0_29, Windows 7 32-bit), which prove that Scala's for and while quite competitive now:


      ::#!
      @echo off
      set JAVA_OPTS=-server
      call scala %0 %*
      goto :eof
      ::!#

      val n = 1000000000

      def timed[T](name: String)(f: => T) = {
      printf("%s :\n", name)
      val t = System.nanoTime
      val r = f
      val d = System.nanoTime - t
      printf("%,d ns\n", d)
      printf("%,d ops/s\n", (n * 1000000000L) / d)
      printf("%,f ns/op\n", d.toFloat / n)
      }

      timed("while") {
      var sum = 0L;
      var i= 1;
      while (i <= n) {
      sum += i
      i += 1
      }
      sum
      }

      timed("for") {
      var sum = 0L;
      for (i <- 1 to n) {
      sum += i
      }
      sum
      }

      while :
      1,172,919,768 ns
      852,573,234 ops/s
      1.172920 ns/op
      for :
      2,059,442,013 ns
      485,568,417 ops/s
      2.059442 ns/op

    • Re: Measurings...

      by Charles Humble,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Just to confirm version tested was 2.9.1. I've amended the text.

    • Re: Measurings...

      by Peter Thomas,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      as others mentioned, benchmarking in REPL is anything but reliable. Putting the code in a file changes the result:


      object Test {
      def scala() {
      val s = new collection.mutable.HashMap[Int,Int]();·
      var i = 0;
      val start = System.currentTimeMillis();
      while(i<100000) { i=i+1;s.put(i,i);};
      val end = System.currentTimeMillis();
      println(end-start);
      println(s.size)
      }
      def javaHashMap() {
      val m = new java.util.HashMap[Int,Int];·
      var i = 0;
      val start = System.currentTimeMillis();
      while(i<100000) { i=i+1;m.put(i,i);};
      val end = System.currentTimeMillis();
      println(end-start);
      println(m.size)
      }
      def javaConcurrentHashMap() {
      val c = new java.util.concurrent.ConcurrentHashMap[Int,Int];·
      var i = 0;
      val start = System.currentTimeMillis();
      while(i<100000) { i=i+1;c.put(i,i);};
      val end = System.currentTimeMillis();
      println(end-start);
      println(c.size)
      }

      def main(args: Array[String]) {
      println("--")
      println("-- JAVA CONCURRENTHASHMAP:")
      println("--")
      javaConcurrentHashMap()
      println("--")
      println("-- JAVA HASHMAP:")
      println("--")
      javaHashMap()
      println("--")
      println("-- SCALA:")
      println("--")
      scala()
      }···
      }


      my test result after a few cycles (scala version: 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_29)):

      --
      -- JAVA CONCURRENTHASHMAP:
      --
      61
      100000
      --
      -- JAVA HASHMAP:
      --
      56
      100000
      --
      -- SCALA:
      --
      68
      100000



      scala is still slower but not that much. In most scenarios (where the collections are relatively small) this should be a non-issue.<//pre>

    • Re: Measurings...

      by James Watson,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      One thing to be careful of is that the order in which you run the tests on the JVM can make a big difference in the results. It's pretty unintuitive but I've seen it many times.

    • I won't call this FUD, but something's fishy

      by Ed Lover,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      I won't call this FUD, since I don't know your intentions, but you should know that I can't reproduce your results.

      The for and while loop tests were only different by a couple of milliseconds. The even more idiomatic

      (0 until 100000).sum

      was a bit slower, but all these microbenchmarks are fraught.

      the mutable.HashMap and java.util.HashMap tests came out about the same, give or take a millisecond. Something doesn't add up with this article.

      (I put the snippets in files and ran them that way. My setup: Ubuntu 64bit, 3-year-old mid-range Intel c2d, Java 1.6.0_26, Scala 2.9.1)

    • Re: I won't call this FUD, but something's fishy

      by Steve McJones,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Just a note: The patch making stuff like (0 until 100000).sum an O(1) operation instead of an O(n) is already in GitHub's pull request.
      So I would recommend stop using it in benchmarks ... it won't loop anymore in the future. :-)

    • Int-overrun, immature code

      by Stefan Wagner,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      The second summation is faster, but both are wrong! :) There is an int-overrun happening, so it has to be:

      var total = 0L; for (i <- 0 until 100000) { total += i };
      and
      var total = 0L; var i=0; while (i < 100000) { i=i+1; total += i };

      to avoid that, but there is a far faster solution:

      val n = 100000L
      val result= n*n/2-n

      But calculating the result isn't the aim of the example? Not? So what is it? In far more complex computations, the costs of looping often vanish.

    • Yammer responds.

      by Richard Hightower,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      eng.yammer.com/blog/2011/11/30/scala-at-yammer....



      Scala is currently the main language for our high-performance backend services and in the past two years we've solved a number of hard problems using it. We built a real-time message delivery service which has scaled to hundreds of thousands of concurrent users. We built a distributed data store for message feeds which serves tens of thousands of requests per second over terabytes of data. We built a new search system, including both a distributed data store for denormalizing business objects into indexable documents and a low-latency query system for auto-completing the millions of people and group names on Yammer. We built a specialized server for integrating with Active Directory which streams staff changes from companies with hundreds of thousands of employees. We built a service for handling Yammer notifications which handles thousands of notifications a second with extremely low latency. We built an OAuth token service which manages gigabytes of cached principal objects and returns results in single-digit milliseconds.




      All of these Scala projects are running in production right now, and we built them all with a team of seven people.




      Along the way, we've found some problems with Scala. And Java. And Ruby. And C. And Javascript. And Objective-C. And Erlang.


    • So benchmarks lie

      by Joao Pedrosa,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      As soon as someone has a benchmark, optimizing some code to it shifts the numbers somewhat artificially towards that benchmark. If the difference is not an order of magnitude it's not even worth it at times.

      For instance, everyone knows the Lua programming language that is one of the fastest dynamic languages. There's a LuaJIT2 that makes it much much faster. But if you are not careful with declaring closures when nesting them in hot loops, it can slow the code down after a little over 1000 iterations. So the rule is watch out for that.

      Whereas in Javascript, nesting functions and hence closures is a very common construction.

      Many times I wonder about screaming fast programming languages as they can be screaming fast but if we hardly use them they are screaming fast at doing nothing most of the time.

      As slow as Ruby can be, it's great to see a Ruby sample anywhere where there is a small algorithm challenge. Someone posts a challenge in his blog, soon there'll be a Ruby sample posted in the comments. I used to be one of the guys posting Ruby samples in people's blogs. Recently I've started posting Dart and Go samples instead. :-) Of the three of them, Go is the fastest which is great to show off. But even Go can get beaten by Ruby if the problem is simple enough that Ruby can offload the work to its C libraries.

      Scala is the topic, though. I have noticed people rushing to post Scala samples to challenges on the web. Scala has a REPL that makes getting something small working in a short time span very easy. Plus the convenient syntax. There's a whole lot to Scala than just the more powerful than most languages aspects of it. Unraveling some code so it runs faster is par for the course in many of the modern languages.

    • Re: I won't call this FUD, but something's fishy

      by Alex Blewitt,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      The only reason I was storing the results in a variable to be used outside was to ensure that the JIT compiler didn't optimise away the increment, rather than being a practical use of summation. Obviously all microbenchmarks are subject to interpretation. The reason for using the 'for' vs 'while' was because that's what Coda mentioned in his mail, and I wanted to attempt to reproduce that effect. I also showed the workings (including being in the REPL for both, and re-running both several times). Hopefully I gave enough to allow others to test.

    • Re: Yammer responds.

      by Alex Blewitt,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Thanks Rick, I've added the link to the post.

    • Response by Typesafe

      by Joost de Vries,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Here's a response by Typesafe that addresses all these concerns.
      Basically it comes down to that they'll fix these things.
      Which is very welcome.
      Only by addressing these concerns can Scala be a real world language bringing higher order programming and pattern matching to the JVM instead of an academic one.

    • Re: Without proper profiling tools you are lost - regardless

      by Daniel Sobral,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Actually, Scala was faster than Java before optimization.

    • On Clojure

      by Alex Miller,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      In case anyone is interested, I wrote up a response to the Yammer emails and also shared some of our own experiences 2 years into using Clojure for production work.

    • Re: Without proper profiling tools you are lost - regardless

      by Matthew Tovbin,

      Your message is awaiting moderation. Thank you for participating in the discussion.

      Quote from the paper (“Loop Recognition in C++/Java/Go/Scala”, Robert Hundt, Google): “All of the experiments are done on an older Pentium IV workstation”. But F# tests are performed on Core Duo and i5 workstations. It's just ridiculous!!!

    Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

    Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

    BT