Yammer Moving from Scala to Java
An e-mail, sent from Yammer employee Coda Hale to Scala's commercial management at Typesafe, ended up being leaked via YCombinator and a gist at GitHub. The e-mail confirms that Yammer is moving its basic infrastructure stack from Scala back to Java, owing to issues with complexity and performance.
Yammer PR Shelley Risk confirmed to InfoQ that the e-mail represented the personal opinions of Coda Hale, rather than an official statement from Yammer itself; a follow up from the original author has been published at http://codahale.com/the-rest-of-the-story/. In it, Coda clarified that the message was a result of a request for feedback from Donald Fischer (CEO of Typesafe) following an earlier tweet indicating the move.
Update: Code has published Yammer's official position on the subject; which confirms the above points. It also points out that any language has flaws (not just Scala) and that the e-mail was an attempt at offering advice for how to improve Scala's performance and other concerns. Finally, it concluded that when rolling out any high performance project (for which Scala is their production environment) there are rough edges which need to be filed down; the e-mail was an attempt at helping Scala improve.
Although the e-mail was not meant to be publicly shared, Coda put it on GitHub via a Gist (since deleted) to get feedback from other friends; however, the content was then subsequently shared and then reported more widely.
Back in August 2010, Coda said on the Yammer Engineering blog that they were moving to Scala for their realtime future. The goal was to continue running on the JVM (for performance reasons) and that the conversion had resulted in approximately a 50% code reduction:
Our initial prototype of Artie was in Java, but as a weekend experiment I tried reimplementing it in Scala 2.8. After a day, I had dropped about half the lines of code and added several tricky features. I was sold. It might be easier to hire Java developers, but a Scala team will be able to get a lot more done
Fast forward a year and a quarter later, and the decision is being reversed:
Right now at Yammer we're moving our basic infrastructure stack over to Java, and keeping Scala support around in the form of façades and legacy libraries. It's not a hurried process and we're just starting out on it, but it's been a long time coming. The essence of it is that the friction and complexity that comes with using Scala instead of Java isn't offset by enough productivity benefit or reduction of maintenance burden for it to make sense as our default language. We'll still have Scala in production, probably in perpetuity, but going forward our main development target will be Java.
Stephen Colebourne, who recently posted the thread on Is Scala the new EJB2? has annotated the mail with a number of bullet points, summarising the issues involved:
- Scala, as a language, has some profoundly interesting ideas in it. But it's also a very complex language.
- In addition to the concepts and specific implementations that Scala introduces, there is also a cultural layer of what it means to write idiomatic Scala … at some point a best practice emerged: ignore the community entirely.
- In hindsight, I definitely underestimated both the difficulty and importance of learning (and teaching) Scala. Because it's effectively impossible to hire people with prior Scala experience, this matters much more than it might otherwise.
- Adding to the unease in development were issues with the build toolchain. … This emphasis on SBT being the one true way has meant the marginalization of Maven and Ant -- the two main build tools in the Java ecosystem.
- Each major Scala release being incompatible with the previous one biases Scala developers towards newer libraries and promotes wheel-reinventing in the general ecosystem.
- Via profiling and examining the bytecode we managed to get a 100x improvement by adopting some simple rules:
- Don't ever use a for-loop
- Don't ever use scala.collection.mutable
- Don't ever use scala.collection.immutable
- Always use private[this]
- Avoid closures
- I broached this issue [moving back to Java] with the team, demo'd the two codebases, and was actually surprised by the rather immediate consensus on switching. There's definitely aspects of Scala we'll miss, but it's not enough to keep us around.
Some of these issues are likely to be circumstantial (for example, the ease of hiring a developer with existing experience increases the longer a language is popular), there are some which can be empirically tested. For example, one of the pieces of advice is to avoid for loops. This can be tested with the following piece of code:
scala>
var start = System.currentTimeMillis();
var total = 0;for(i <- 0 until 100000) { total += i };
var end = System.currentTimeMillis();
println(end-start);
println(total);
114
scala>
scala<
var start = System.currentTimeMillis();
var total = 0;var i=0;while(i < 100000) { i=i+1;total += i };
var end = System.currentTimeMillis();
println(end-start);
println(total);
8
Using the for loop with an 'until' pattern here (which many Scala programmers would consider idiomatic) can be seen to be significantly slower than the corresponding while loop, even if the code is less readable. The corresponding Java implementation of the same loop shows a result of 2ms for both the for and while loops.
Another test we can perform is the performance of the mutable map by loading in a data set consisting of Integer objects. (This can be compared in Java and Scala and the cost of boxing should be equivalent.):
scala>
val m = new scala.collection.mutable.HashMap[Int,Int];
var i = 0;
var start = System.currentTimeMillis();
while(i<100000) { i=i+1;m.put(i,i);};
var end = System.currentTimeMillis();
println(end-start);
println(m.size)
101
scala>
val m = new java.util.HashMap[Int,Int];
var i = 0;
var start = System.currentTimeMillis();
while(i<100000) { i=i+1;m.put(i,i);};
var end = System.currentTimeMillis();
println(end-start);
println(m.size)
28
scala>
val m = new java.util.concurrent.ConcurrentHashMap[Int,Int];
var i = 0;
var start = System.currentTimeMillis();
while(i<100000) { i=i+1;m.put(i,i);};
var end = System.currentTimeMillis();
println(end-start);
println(m.size)
55
Compared against the vanilla Java code, performance is identical when comparing the java.util.HashMap, and the Java implementation with java.util.concurrent.ConcurrentHashMap is twice as fast as its Scala counterpart. Both of the Java collection classes outperform the Scala counterpart, however. (Timings taken on OSX JVM 1.6.0_29 and Scala 2.9.1, the latest at the time of writing.)
Unfortunately, the Scala collections are pervasive in the Scala library APIs, and as such, they may be promoted from the Java object types to the Scala object types through implicits in the code. According to the migration mail, this resulted in significant re-writing for performance reasons.
Performance of closures (lambdas) may be improved if the Scala compiler generates code with invokedynamic; something that might happen in future versions of the Scala compiler. In addition, in JDK 8 (which will bring both native lambdas and method handles to Java ) has a number of performance advantages which a future Scala version may be able to take advantage of.
Finally, there is increasing pressure for Scala to fix its backward compatibility between releases (rather than just in the minor releases between 2.9.2 and 2.9.3). There has been no official announcement from Typesafe regarding the future roadmap on Scala, or when a stable compiled binary format will permit code to be backwardly (or forwardly) compatible between releases. Having a backward compatible format would enable for more stable libraries to be released and build a community repository, which would help anyone interested in building upon Scala for the future.
Macro vs Mircro Optimization
by
Thomas Santana
Measurings...
by
Daniel Sobral
At any rate, I could not reproduce your results. Scala's mutable.HashMap, either 2.9.1 or trunk, performs pretty much on par with Java when executing the same code snippets you provided.
Re: Measurings...
by
Robin St
java.util.HashMap: 47, 44, 54, 51, 43
scala.collection.mutable.HashMap: 81, 81, 82, 81, 84
Scala version:
Welcome to Scala version 2.9.1.final (OpenJDK 64-Bit Server VM, Java 1.6.0_23).
Re: Measurings...
by
Joost de Vries
Re: Measurings...
by
Daniel Sobral
Without proper profiling tools you are lost - regardless
by
Faisal Waris
I can attest to the importance of profiling. I actually ported the Google benchmark to F#. There was a big difference in performance in the initial F# implementation and the final one. I used the Visual Studio 2010 profiler (an excellent tool) to quickly zero-in on the "hotspots" and optimize the code.
The final version of F# turned out to be even faster than the original C++ implemenation as independently verified by someone.
Note that the final C++ implementation was fastest of all and is more directly comparable to final Scala and F# implementations. The orginal C++ implementation did too many allocations which were removed in the final version and caching was used instead. (you have to dig through the links to find all of that)
Bottom line: Write your code idiomatically and then profile and tune. The code does not have to be super optimized in all locations. Good profiling tools are a must.
Re: Without proper profiling tools you are lost - regardless
by
Michael Campbell
1000 times, this. For what I'm guessing to be a large portion of us, the absolute fastest is not what we need. Our requirements are unlike Yammer's. For a lot of us, it doesn't matter if something is "slower", if we get it out the door 3 to 6 months quicker.
But there are those of use for whom it DOES matter. And for these cases, you absolutely have got to profile. Performance isn't a problem until performance is a problem.
Please don't use REPL for benchmarking
by
Andriy Plokhotnyuk
::#!
@echo off
set JAVA_OPTS=-server
call scala %0 %*
goto :eof
::!#
val n = 1000000000
def timed[T](name: String)(f: => T) = {
printf("%s :\n", name)
val t = System.nanoTime
val r = f
val d = System.nanoTime - t
printf("%,d ns\n", d)
printf("%,d ops/s\n", (n * 1000000000L) / d)
printf("%,f ns/op\n", d.toFloat / n)
}
timed("while") {
var sum = 0L;
var i= 1;
while (i <= n) {
sum += i
i += 1
}
sum
}
timed("for") {
var sum = 0L;
for (i <- 1 to n) {
sum += i
}
sum
}
while :
1,172,919,768 ns
852,573,234 ops/s
1.172920 ns/op
for :
2,059,442,013 ns
485,568,417 ops/s
2.059442 ns/op
Re: Measurings...
by
Charles Humble
Re: Measurings...
by
Peter Thomas
object Test {
def scala() {
val s = new collection.mutable.HashMap[Int,Int]();·
var i = 0;
val start = System.currentTimeMillis();
while(i<100000) { i=i+1;s.put(i,i);};
val end = System.currentTimeMillis();
println(end-start);
println(s.size)
}
def javaHashMap() {
val m = new java.util.HashMap[Int,Int];·
var i = 0;
val start = System.currentTimeMillis();
while(i<100000) { i=i+1;m.put(i,i);};
val end = System.currentTimeMillis();
println(end-start);
println(m.size)
}
def javaConcurrentHashMap() {
val c = new java.util.concurrent.ConcurrentHashMap[Int,Int];·
var i = 0;
val start = System.currentTimeMillis();
while(i<100000) { i=i+1;c.put(i,i);};
val end = System.currentTimeMillis();
println(end-start);
println(c.size)
}
def main(args: Array[String]) {
println("--")
println("-- JAVA CONCURRENTHASHMAP:")
println("--")
javaConcurrentHashMap()
println("--")
println("-- JAVA HASHMAP:")
println("--")
javaHashMap()
println("--")
println("-- SCALA:")
println("--")
scala()
}···
}
my test result after a few cycles (scala version: 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_29)):
--
-- JAVA CONCURRENTHASHMAP:
--
61
100000
--
-- JAVA HASHMAP:
--
56
100000
--
-- SCALA:
--
68
100000
scala is still slower but not that much. In most scenarios (where the collections are relatively small) this should be a non-issue.<//pre>
Re: Measurings...
by
James Watson
I won't call this FUD, but something's fishy
by
Ed Lover
The for and while loop tests were only different by a couple of milliseconds. The even more idiomatic
(0 until 100000).sum
was a bit slower, but all these microbenchmarks are fraught.
the mutable.HashMap and java.util.HashMap tests came out about the same, give or take a millisecond. Something doesn't add up with this article.
(I put the snippets in files and ran them that way. My setup: Ubuntu 64bit, 3-year-old mid-range Intel c2d, Java 1.6.0_26, Scala 2.9.1)
Re: I won't call this FUD, but something's fishy
by
Steve McJones
So I would recommend stop using it in benchmarks ... it won't loop anymore in the future. :-)
Int-overrun, immature code
by
Stefan Wagner
var total = 0L; for (i <- 0 until 100000) { total += i };
and
var total = 0L; var i=0; while (i < 100000) { i=i+1; total += i };
to avoid that, but there is a far faster solution:
val n = 100000L
val result= n*n/2-n
But calculating the result isn't the aim of the example? Not? So what is it? In far more complex computations, the costs of looping often vanish.
Yammer responds.
by
Richard Hightower
Scala is currently the main language for our high-performance backend services and in the past two years we've solved a number of hard problems using it. We built a real-time message delivery service which has scaled to hundreds of thousands of concurrent users. We built a distributed data store for message feeds which serves tens of thousands of requests per second over terabytes of data. We built a new search system, including both a distributed data store for denormalizing business objects into indexable documents and a low-latency query system for auto-completing the millions of people and group names on Yammer. We built a specialized server for integrating with Active Directory which streams staff changes from companies with hundreds of thousands of employees. We built a service for handling Yammer notifications which handles thousands of notifications a second with extremely low latency. We built an OAuth token service which manages gigabytes of cached principal objects and returns results in single-digit milliseconds.
All of these Scala projects are running in production right now, and we built them all with a team of seven people.
Along the way, we've found some problems with Scala. And Java. And Ruby. And C. And Javascript. And Objective-C. And Erlang.
So benchmarks lie
by
Joao Pedrosa
For instance, everyone knows the Lua programming language that is one of the fastest dynamic languages. There's a LuaJIT2 that makes it much much faster. But if you are not careful with declaring closures when nesting them in hot loops, it can slow the code down after a little over 1000 iterations. So the rule is watch out for that.
Whereas in Javascript, nesting functions and hence closures is a very common construction.
Many times I wonder about screaming fast programming languages as they can be screaming fast but if we hardly use them they are screaming fast at doing nothing most of the time.
As slow as Ruby can be, it's great to see a Ruby sample anywhere where there is a small algorithm challenge. Someone posts a challenge in his blog, soon there'll be a Ruby sample posted in the comments. I used to be one of the guys posting Ruby samples in people's blogs. Recently I've started posting Dart and Go samples instead. :-) Of the three of them, Go is the fastest which is great to show off. But even Go can get beaten by Ruby if the problem is simple enough that Ruby can offload the work to its C libraries.
Scala is the topic, though. I have noticed people rushing to post Scala samples to challenges on the web. Scala has a REPL that makes getting something small working in a short time span very easy. Plus the convenient syntax. There's a whole lot to Scala than just the more powerful than most languages aspects of it. Unraveling some code so it runs faster is par for the course in many of the modern languages.
Re: I won't call this FUD, but something's fishy
by
Alex Blewitt
Response by Typesafe
by
Joost de Vries
Basically it comes down to that they'll fix these things.
Which is very welcome.
Only by addressing these concerns can Scala be a real world language bringing higher order programming and pattern matching to the JVM instead of an academic one.
Re: Without proper profiling tools you are lost - regardless
by
Daniel Sobral
On Clojure
by
Alex Miller
Re: Without proper profiling tools you are lost - regardless
by
Matthew Tovbin
Educational Content
Large-Scale Continuous Testing in the Cloud
John Penix May 24, 2013
Managing Build Jobs for Continuous Delivery
Martin Peston May 24, 2013
Clojure in the Field
Stuart Halloway May 23, 2013




Hello stranger!
You need to Register an InfoQ account or Login to post comments. But there's so much more behind being registered.Get the most out of the InfoQ experience.
Tell us what you think