Bio Nick Kallen is a Systems Engineer at Twitter. He is the author of Arel, NamedScope, Cache Money, and Screw.Unit; and a co-creator FlockDb, Twitter's distributed graph database.
QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.
1. We’re talking to Nick Kallen from Twitter about Scala. Nick, you say that you are using Scala more and more at Twitter. Tell me a little bit how that’s happened, what Scala is good for at Twitter, and so on.
We use Scala for a lot of our newer network services. As you may know, Twitter was originally written in Ruby on Rails using the Rails Web Framework and even some of our first home-grown network services like the first message queue we built, which was called Starling, was originally written in Ruby. At a certain point in Twitter’s history, engineers discovered Scala and wanted to experiment with it. The first project that was taken on was to rewrite our Ruby message queue into Scala. That became what’s now called Kestrel. That’s a message queue that implements the memcache protocol as the RPC protocol. That project was very much a success, and we went on as Twitter gradually grew and we had need to build more sophisticated systems, we started to build more and more of them in Scala.
For example, our Firehose which is an HTTP push service that streams all of the tweets to some of the organizations that we partner with, as well as some other internal services - that’s written in Scala. Some of the projects that I worked on over the last year or two are written in Scala. For example, Flock, which is a middleware layer that partitions our MySQL database effectively. We represent a distributed graph and Flock is written in Scala as a middleware layer that implements a Thrift RPC protocol and does transparent partitioning and replication and sophisticated query evaluation on top of a large cluster of MySQL backends.
Just now, more and more projects are starting, and by default they start in Scala.
That’s a great question, but it’s very subjective. I’ll give you my personal opinion and I wouldn’t necessarily say it’s everyone’s opinion at Twitter. I think Scala’s sweet spot is network services. I think one way of describing how network services work is that they have a very strict or constrained interface, like usually an RPC layer is a very tight protocol with very few procedures or methods to invoke. But the implementation of those methods is very sophisticated. Often it will involve partitioning strategies or a sophisticated use of the file system or something like that. A system like that has an interesting property as a small surface area of the protocol, but sophisticated implementation. A rich type system or a rich statically typed language, in my judgment, is a real boon for those kinds of problems.
I’ll contrast that with building a website. When rendering web pages, often you have very many components interacting on a web page. You have buttons over here and little widgets over there and there are dozens of them on a webpage, as well as possibly dozens or hundreds of web pages on your website that are all dynamic. With a system with a really large surface area like that, using a statically typed language is actually quite inflexible. I would find it painful probably to program in Scala and render a web page with it, when I want to interactively push around buttons and what-not. If the whole system has to be coherent, like the whole system has to type check just to be able to move a button around, I think that can be really inflexible.
Something that a dynamic language offers, in contrast, is the system need not be coherent while you are experimenting. That’s a weakness of Scala or at least it’s an area where it’s less applicable. But for network services, a statically typed language with a very rich and expressive type system like Scala gives you a lot of security. On balance, the language itself, is very efficient: the way the compiler is implemented, and how efficient the JVM is as well, and the availability of libraries from the JVM - threading and concurrency libraries, and good data structure implementations - makes it really good for those types of problems.
With Scala I think you have all of the major benefits of Java: you have access to all of the libraries, almost all of the libraries that are available, in an effortless way. That includes not just libraries, but tools; for example Visual VM and commercial tools as well as things that come with the JVM, such as JHAT and other things. As you run into performance problems or memory leaks, or something like the tool chain that exists in the Java world just applies directly to Scala, which is wonderful and the accessibility of the libraries from Java is also really wonderful.
I think a disadvantage of Java in relation to Scala is in some ways Java is a very simple language and inflexible and it has very few features that allow you to flexibly express your program at a level of abstraction that maybe matches your domain. Like you have fewer tools for abstracting, for reusing code, and for expressing sophisticated ideas in the type systems so that they can be guaranteed to be correct, as opposed to verified to be mostly correct as in an unit test. That kind of flexibility for code reuse as well as the level of abstraction you can program at that is arguably more readable and more closely matches the domain is a strength of Scala over Java. But you get all of the major benefits of Java from efficient implementation to great tooling to great libraries.
One of the things that I think is most attractive about Scala is that you can program using idioms from object-oriented languages as well as functional languages. Even object-oriented idioms which you might suspect are fully supported by Java, even Java in some ways is inflexible and if you come from a dynamic language like originally I come from Ruby, Java seems perhaps simplistic in some of the tools it offers for object-orientation. Maybe a concrete example might be the fact that in Java, interfaces cannot store implementation, so an interface cannot have implementation. In Scala you have tools like traits, which are analogous to modules in Ruby and these are called mixins in other languages. They are traditional object-oriented tools that Scala gives you. Even though Scala is good because it’s not just object-oriented, it’s actually great at being object-oriented.
At the extreme of functional programming techniques, you have something called "monads", which are inherited from Haskell. It’s a kind of a scary word, although the idea is very simple: there is a design pattern that emerged in the functional programming community, but they recognized that certain kinds of things like the "maybe" type, which has an equivalent in Scala of the "option" type, and lists are sequences, and stateful IO as well as asynchronous computation share an interface in common. They actually share certain mathematical properties, but I like to use the word "interface" - I think it’s clearer.
A lot of the libraries in Scala are influenced by this design pattern or these mathematical properties. The language directly supports them in a way because there is a syntactic sugar for what’s called the "for comprehension". As beginner programmers are exposed to it, it’s like in Python, where you can tersely express iteration through a sequence and filtration and nested iteration, but it turns out that it’s much more general and can be used for asynchronous computation as well as even very basic problems like avoiding null pointer exceptions.
In the example - this is an idiom that’s inherited from Haskell - there is what’s called the option type and it’s a container that represents the presence of a value or the absence of a value. And it’s analogous to whether an object is null or not null in say Java. Because that’s reified directly in the standard libraries and as part of the style of idiomatic programming in Scala, it’s actually quite easy to avoid null pointer exceptions - arguably much easier than in Java. That’s really a functional idiom, although in some ways, it’s just a certain kind of interpretation of the null object pattern, that’s an object-oriented design pattern.
5. In Scala, as I understand it, a common pattern is the actor pattern. Could you contrast uses of the actor pattern in Scala versus maybe threaded patterns? Are they both supported? How does that work well?
That’s a great question and there are a lot of different opinions on this. Scala ships with a library that implements the actor pattern, so Scala is somewhat unique in that. Unlike Erlang, which is maybe the canonical example of the actor pattern, where the actor pattern is directly supported in the language, Scala, because of its use of closures and some other features of the language makes it quite easy to implement actors directly as a library. Something interesting about Scala is that many concurrency patterns don’t need to be directly implemented in the language but can be implemented as libraries. That’s something very powerful and gives the programmer a lot of flexibility to choosing one concurrency model over the other.
A second part of your question is maybe like "What are the merits of the actor pattern as compared to say programming directly with threads?" The traditional argument in favor of the actor pattern is, because there is very little shared state across agents that are interacting concurrently, it’s difficult to have synchronization problems - it’s relatively more difficult. You don’t have a shared mutable data structure, for example, that can be concurrently mutated in incoherent ways. A typical pattern is to do a "get" and then a "set" to read the value of a mutable data structure before deciding what to set. If you have two parallel threads that are doing those sort of operations, usually you need to synchronize that sort of behavior.
With actors, often there isn’t shared state at all and data is passed between interacting agents by sending messages and so data is copied rather than shared. Because of that copying, arguably it’s much easier to write concurrency-safe code. I think that argument is true, but I’d like to point out maybe some weaknesses in the argument and one of them is that you simply can’t model all problems as message passing. There is shared state. In the most obvious example, the shared state of the database, as an external process, if it’s accessed by different actors, if it’s accessed without coordination, without transactions, without "select for update" or without locks, you can easily have two concurrent agents leading to an incoherent, inconsistent result.
The same thing applies to in-process data structures. For example, if you are building a network service that relies on an in-process cache where there is an extensive object graph that represents the domain that you are dealing with in-process. Those data structures are often mutable -- maybe they are changed as a result of the user actions -- and you can run into the same concurrency problems that you run into with an external database with these in-process data structures. Often very simple caches that are stored in hash tables will be shared across concurrent units and you do need to coordinate access to them.
That said, not all problems are modeled that way. So some mix of the two seems to be the sweet spot maybe and Scala is perhaps unique in that it elegantly supports a mix of the two. But even then, you don’t have to married to the actor pattern in Scala. I’m very fond of Futures and I use Futures more than I use actors. An argument in favor of Futures is they are more like plain old Scala objects and it’s easier to unit test with them. With actors it’s actually maybe non-trivial to set up a unit test of interacting agents.
It’s my judgment that Scala hits this sweet spot of being the perfect language for network services and it’s not radically better. I think it would be a mistake to say "It’s better because it’s functional." Some people make the claim that functional programming techniques are simply better and more scalable than object-oriented programming techniques. On the other hand, there are object-oriented enthusiasts who make the opposite claim. My tastes are somewhat eclectic, I suppose, and I think it’s hard to argue against using the best tool for the job. I think Scala is a maximalist language in the sense that it supports many different things and it supports these different things very well.
On balance, I think, because of access to great tools from the Java libraries, great concurrency primitives, great tools on the JVM for debugging performance problems and memory leaks and what-not. Then, it’s just a great feature-rich language that allows you to express a lot of different ideas very clearly and very essentially. It makes for a great language that’s suited to this special class of problems substantially better than any other alternatives.
I think it’s worth looking into.