Bio Dean Wampler is co-author of Programming Scala and owner/principal of Aspect Research Associates. His expertise areas include polyglot programming, Poly-paradigm programming, and software craftsmanship. He was formerly with Object Mentor and has worked in many industries, including Internet startups, wireless telecoms, medical electronics companies, and tools vendors.
Strange Loop is a developer-run software conference. Innovation, creativity, and the future happen in the magical nexus "between" established areas. Strange Loop eagerly promotes a mix of languages and technologies in this nexus, bringing together the worlds of bleeding edge technology, enterprise systems, and academic research. Of particular interest are new directions in data storage, alternative languages, concurrent and distributed systems, front-end web, semantic web, and mobile apps.
It’s probably the first level that I would say it’s good replacement for anything you’d use Java for - mostly server side, JVM-based where the static typing is maybe a benefit more than the flexibility of dynamic typing. So you wouldn’t use it necessarily for small scripts, but it does have that capability. You wouldn’t necessarily want to load it in a browser, the way they wanted to do with Java, but for that sweet spot of the server, talking to databases, networking, concurrency, APIs it’s really an optimal tool for that.
The thesis behind Scala from its creator Martin Odersky is that really these two paradigms, if you can use that big term, - functional programming and object oriented programming are actually more complementary than opposed to each other. On the surface they seem to be opposed because functional programming is about immutability, is sort of mathematical in ways that variables and functions work with no-side effects, whereas objects are about mutating state of an object, so mutability side effects are the name of the game. But you can bring these two together by having immutable objects using standard functional style idioms like iteration, comprehensions and so forth.
So what Scala actually does is because it runs in the VM, everything is an object, even the functions that you write. At the same time, there are ways you can make objects behave like functions when needed by creating an apply method, which the compiler will interpret as the thing to call if you take an object, put an argument list after it and try to invoke it as if it’s a function. It does try to bridge the two together, treat them as somewhat uniform and unified as opposed to diametrically opposed. There are some compromises it has to make, like you don’t get full laziness like you’d have in a language like Haskell, but in general it does a pretty good job.
Probably the biggest thing is a reworked collection library that doesn’t necessarily change the public API in dramatic ways, but it uses a lot more sophisticated techniques for construction of things. A classic example is suppose I have a particular kind of map subclass and I want to do a map, call the map method over it, but I would like to return an instance of the same type. Actually it was not possible before and the Scala API would return some super type, like a sequence or something. But now, using some of these sophisticated tools in the language they can actually return things of the same type.
So you have a consistent chain of calls to these things that are returning new instances in the functional way, but they are still instances of the same type in sort of the object oriented way. That’s one of the big benefits. There are a couple of other nice little syntactic improvements, like the ability to have default values for arguments so that a lot of times you might have an argument in a function call that really is going to have the same value most of the time. So you like to be able to give the user the ability to call it without an argument, so it will just default to that value. These are the couple of things that I really like that are new in the language.
4. With Scala 2.8 there were some breaking changes which required recompilation and sometimes code change from 2.7. The frequency between the breaking changes between the 2.x releases of Scala has been raised as a concern. What are your thoughts on that?
I think the problem that the Scala team got into is that 2.8 was intended to be sort of the usual point release, but it’s evolved into what was really a 3.0 release. Then they adopted the strategy that we’re making big changes, we should really be sure to get them right, even if there is temporary pain as we make non-compatible changes in these point releases. It did cause a lot of real pain for people, as they were coming up to the 2.8 final. A lot of people then, as a result, just decided to stay on 2.7 until 2.8 was final before moving over and at the same time they did break some APIs and partly the ways I described with the collections API.
For example one of the things I didn’t mention is they reorganized the package structure to be more consistent. The general model was we know this is painful, we are making changes that we think we need to make now while the language community is relatively small, but that will set the stage for further growth in future and fewer cases of incompatible changes. So we’ve sort of bitten the bullet, had that pain now, gotten it out of the way and our goal now is to maintain binary compatibility for all of the minor releases and even some of the major releases until probably 3.0 whenever that comes out will have some non-backwards compatible changes. But at least we’ll know that they’re coming and we can prepare for them.
One of the things that Scala did on the JVM where it was first implemented is it leveraged as much as it could the underlying nature of the bytecode and the requirements of the JVM as well as standard Java libraries. They had to do a couple of things for the CLR port, one of which will be a problem for people if they just want to naively try bring code over is that the underlying libraries will be different. So there won’t really be source compatibility between the two languages, although it will be kind of like porting SQL between databases. They are not quite the same languages, but they are pretty close, so hopefully there won’t be much of an issue.
One of the interesting decisions they had to make was what to do about type erasure. In Generics in Java they actually erase the type of the generic. That was done of course in Java 5 for backwards compatibility. .NET doesn’t do that, but in order to maintain some compatibility, they actually are continuing with erasure in the .NET version.
I’m not really sure. I haven’t actually used the .NET version. I should also mention that the state of the implementation is a little bit behind the JVM implementation, although it recently got an infusion of support from Microsoft for that. But that is an interesting question, how that interaction is going to happen and what are the issues. Scala also ran into some problems, even on the JVM with things like the behavior of Java annotations that they had to work through. At this stage Java is pretty much completely compatible where it needs to be, but I’m sure there are some other areas like this in .NET platform that people should be aware of.
The interesting thing about that is Scala doesn’t really adopt any specific concurrency model in the way that say Erlang does. Rather it’s like Java in the sense that whatever library you have, you can use. In that sense it doesn’t do anything for you, even though you may be mislead into thinking that it does because it comes with an actor library and other frameworks are implementing the actor libraries. But perhaps the most interesting thing that it does to make concurrency programming easier is the general support for domain specific languages, the features of the language that will let you define abstractions in this.
In a domain of concurrency it could actually be very valuable for building concurrency models that are intuitive when you read the code as opposed to "I’m reading this syntax in this language and I have to map it to constructs in my concurrency model." I think that the most important thing is that first it leverages the power of the JVM and CLR for letting you do concurrency either at the multithreading level or at a higher level of abstraction. That also gives you the ability to write domain-specific languages or easily intuitively represent a concurrency model that you’re working with.
But what it does instead is it gives you a feature that uses an implicit mechanism where I can declare a function as implicit that will be treated as a converter function. Say I have an object, I want to call a method on it, that doesn’t actually exist on the type of that object, but I can coerce that object into another type that does have that method. And I can use this implicit mechanism to tell the compiler "Do this conversion when you see this kind of situation" Then it would look as if that method has actually been added to the class when in fact it hasn’t. We’ve just done a behind the scenes conversion to another type.
If you’ve heard of Haskell (most people at least heard of it), it has a mechanism called type classes, which is actually a similar idea that you can define a behavior, a protocol say and then define how that is supported by different types without actually modifying those existing types. The object oriented way to do this would be to try to subclass the types in some way and then add the method. But this implicit mechanism that I just described is another way to implement a style of type classes where I can add a new method to say a group of classes without actually modifying the classes. Of course I’m not adding this type in reality, I’m just creating the illusion that that new method is there. That’s the nice thing about the implicit mechanism: it gives you that ability to firewall off these specific behaviors that I need only in a limited context, as opposed to shoving them in classes when I may or may not want them.
Akka is a framework that basically adds a lot of the enterprise class features that you would have to have to build distributed highly scalable applications. In this sense it works the way Spring works in the Java world, filling in a lot of important gaps that are necessary when you are building enterprise applications. Scala or rather Akka has a lot of features, but the most important perhaps in this context is a very scalable, very robust actor library that more fully realizes the models of Erlang’s OTP [Open Telecom Platform] framework. For example you can easily create tens of thousands of Akka actors because they tend to be lighter weight than the actors in the standard Scala actor library.
Also the other thing that it brings that is worth mentioning is the notion of actor supervisors. If I have an actor running and it gets into trouble and it has to die, what should happen in that case? I can declaratively specify that another actor will be its supervisor, if my actor, my worker dies I can have it restart, I can have sibling actors restart if they are collaborating in some way. I can have all this transparently happen at runtime just by the way I declare things when I start up the system. So, for me, the Akka actor library fills in some missing features in the Scala standard library and it also brings other things like integration with a software transactional memory library called Multiverse and integration with a bunch of other third party libraries like for REST and Comet and things like that.
10. Software transactional memory is something which has been worked on quite a bit. It’s been mentioned many times over the last years as one of the potential solutions for creating more intuitive parallel programming. What are your thoughts on that? What’s the state of development now? How usable is it?
I think it’s been proven very successfully in Clojure, which is kind of a core concept in Clojure. In effect, some of the Clojure data structures are implemented that way whether you like it or not, you don’t have any options, which is probably a good thing. For those who don’t know what it’s all about, it’s the idea of can we bring database ACID style transactions in memory updates where we lose the durability part because it’s all in memory hence it’s not actually written to a file or anything. I think it’s really a very important idea both because sometimes if you do actually have to mutate data structures, you just can’t avoid or it’s the most efficient thing or the most intuitive thing to do, but unless you do it in a very principled and controlled way, so that you don’t have this Wild West scenario where you are clobbering something that someone else is in the middle of using.
For example, if you are iterating through a Java collection and somebody deletes an element out of it, the collection suddenly becomes undefined and the iteration becomes undefined in its behavior. Stuff like that is just something we can’t afford to allow to happen. The STM model is that that guy that’s doing the iteration can have an earlier version of the collection while we’re constructing a new version that would be used subsequently by other people. The other thing that STM does pretty powerful is obviously if you are making copies of big data structures, that’s going to be very inefficient, but the fact is that usually you only modify small piece of it. So why don’t we share what hasn’t changed between the old and the new?
Typically, the way things are implemented is they use a tree structure where, if I say "Insert a new element" I’ll just create a root node for the tree and have it point to the things it didn’t change as well as the new node. So there are a lot of clever techniques that can be used to both give you good performance, but also this controlled mutable transitions.
This is one of the exciting things to me - that the JVM, which was originally written just for Java, has become this really fruitful platform for a lot of languages. JRuby is a great example of something that seems totally unrelated to Java. How could it possibly work? And yet the geniuses on the JRuby team made it work very well. What is different about Scala vs. pretty much all the other major ones that we talk about like Clojure, JRuby and so forth is that Scala is statically typed. It continues in that tradition as opposed to being dynamically typed. I don’t know if there will ever be a clear winner and I’m not sure it might be a good thing if there ever were, but I think you’ll see people using Scala for general purpose programming, like they’ve done with Java.
I’ve noticed that it seems that people are doing a lot of data intensive work like Clojure because it just focuses on that problem of manipulating data, which I think is great. You can certainly write services of any kind in Clojure if you like. Then I see a lot of people using JRuby as a bridge to the Ruby world and the kind of dynamic behavior that you come to appreciate when you are working in Ruby. Then there are other languages like Groovy and Jython that fit into that same model with JRuby giving it dynamicity, but also the performance of the JVM.
12. The languages which are in most common use nowadays are in the C, C++ family of languages. So you’ve got C and C++, you’ve got Java and you’ve got C# and those tend to have, I don’t think it’s controversial to say they have a majority of the development market. What things can one learn from more modern languages, more recent languages which can then be transferred into those? What can you learn from more modern languages that allows you to enhance the way that you as a developer create code?
That’s a good question and you remind me actually of early in my career and I don’t want to date myself too much, but maybe I’ll mention this anyway. I started out programming in C and it was about the time that C++ was just coming out and not widely embraced yet, but I was really interested in object oriented programming, I really wanted to use it in C, so I played with the C++ at home and what I realized is that I had a lot of the facilities already in C. I could declare structures, I could restrict methods to work on particular structures, even if I didn’t have member variables or member methods, even if I didn’t have inheritance, I can still get a lot of the benefit of C++ or object oriented programming in general in my regular C code and it tended to improve the quality.
What I think people can always do, is they can always learn radically new languages, like a functional language if you’ve never dealt with that before or maybe one of these concatenative languages like Io and then think about how that would inform the way you do your regular coding. You may find that some things I’ll just do the old fashioned way because that’s the natural way to do it, some things may be hard to do in the other language if they are not supported natively, like doing objects in C++ and in particular doing that in C. But nevertheless, just having that in the back of your mind, a lot of the times you’ll find that you can make some general application of those ideas to your current code and improve the quality.
For me, the big example of that right now really is making data immutable whenever it can be, avoiding side effects in functions whenever I can do that and so forth. So I do think that people should learn, they shouldn’t learn the language that’s like the one they already know. If you know Java, you don’t necessarily learn C#, but you do learn F# perhaps.
13. With the knowledge that you've gained from these other language that you’ve learned about though, is it possible that, when you go back to the language that you’re used to that you’ll create code which is not necessarily intuitive to the platform, may be difficult for other developers which don’t have that same background to understand?
This is a very realistic problem that you could have, like trying to do inheritance in C. You can actually do it, you can build your own virtual tables for function dispatch but it’s messy, forget trying to maintain it. I think you do have to avoid that temptation. We all deal with the reality of what’s the team going to be able to understand, what can they be reasonably expected to do, what makes sense. We have to avoid the temptation of treating every new shiny toy as a hammer, if I can mix my metaphors, that we apply to every problem. Sometimes there will be a case where you have to make the decision to go with what’s natural in the language, even if it’s maybe less efficient in some other way.
But you can also be pretty clever sometimes and find ways to perhaps expose abstractions that people understand on the team that maybe follow the usual conventions they are used to, but internally use some more sophisticated or newer ideas, maybe for performance reasons or whatever. I do think that, in the long run, you really have to, as part of this whole idea being a craftsman or whatever we want to call ourselves, think seriously about "How am I delivering value by applying this new idea?" If it’s just because it’s clever and I want to look cool or I want to play with this toy, that’s a really bad reason. But if it does actually give you better performance or better quality or better understanding or less code, which I think is important as well, then maybe that’s a compelling case for it.
Even if you adopt the new language, say, you may still have all these same problems with "My team doesn’t know what it means to write functional programming in Scala or whatever." There are always these tradeoffs you have to make and you have to be a good evangelist for the idea as well as someone who knows how to use it in the first place.
14. For someone who is currently working in a large Java only code-base, maybe it’s a code-base that’s been around for several years, how would you recommend getting Scala into that code-base? Where would be the best place to start and what would be the best entry path for this new language in this existing code-base?
My thinking on this actually has changed recently. I used to recommend that people maybe adopt a testing tool in the new language or whatever that they want to use and try to get their feet wet that way, in a way that doesn’t put production code at risk. But I’ve actually come to believe that doesn’t work very well unless you’re really dedicated to making it work. The reason it doesn’t is because unless you’re immersed in this new environment, if you are trying to learn it on the fly, you’re probably not going to really master it and it will ultimately just be something that you are using that it’s different but not really adding real value. You may find yourself going back to what you’re doing before. I’ve certainly seen examples of this recently.
The other strategy I’ve heard people recommend, which I’m actually thinking now it’s probably the right way, is find a problem that just cannot be solved easily or maybe even at all in the current technology, but it is a slam-dunk (if I can use that word) in the new technology and basically prove by example that this is the way to go. That’s usually the most effective way to get something in. Then, occasionally, you might also be able to convince your boss, because you’ve been a successful developer and responsible and all that good stuff, you might convince them to let you use some new technology, even though you don’t necessarily have such a compelling use case for it. But for the most part, if you can make the sell by example, it’s probably going to be better.
I think when you see and understand actor code written in either the native Scala library or an alternative like Akka, it’s just a very compelling thing because it doesn’t take much code. It very quickly becomes obvious what’s going on once you understand how the API works and when you compare that to what you might write in a non-Scala library it just becomes clear that you are saving a lot of space, a lot of room, you are getting concise code with very high information density. For me, these kinds of Scala actor examples are really great examples of the language at its best. Other examples would be the pattern matching, the functional idioms and the functional data structures that tend to be more elegant and more robust than the Java equivalents, in part because they handle concurrency in a better way.
Those are some small examples. Certainly, people like Twitter, Foursquare, LinkedIn and others have bet big on Scala and have been pretty successful with it. But I have to say to be fair these are teams that were motivated, very good developers, they were willing to live with the pain of "Your tools aren’t quite as good" and they have the abilities as developers to master the techniques of the language. So you always have to weigh those considerations, too.