00:36:59 video length
Bio Martin Odersky is a German computer scientist and professor of programming methods at the EPFL in Lausanne, Switzerland. He specializes in code analysis and programming languages. He designed the Scala programming language and Generic Java, and built the current generation of javac, the Java compiler. In 2007 he was inducted as a Fellow of the Association for Computing Machinery.
The target audience for GOTO conferences are software developers, IT architects and project managers. GOTO Aarhus is an annual event in Denmark. The idea for GOTO (formerly known as JAOO) came about as the management at Trifork was dissatisfied with the conferences that existed and wanted to create a forum that would give the development staff inspiration, energy and desire to learn coupled with the chance to network with the IT community.
I’ve been very busy this year, the first thing we did was organize this Scala Days conference, which was a big success, it was sold out in April, then there was a release of 2.8, the new big step forward in the Scala version, and then, over the last month, I’ve been very busy setting up a company which I have announced at my talk at GOTO JAOO on Monday. The company’s called Scala Solutions and it aims to help Scala developers.
Scala 2.8 it’s true it was a very big release going forward. It took a long time coming. I think we had the 2.7 series out for about two years until we made this step. The main motivation for 2.8 was to have a more uniform and powerful collections library and originally we thought that with a higher-kinded types which was one of the additions in 2.7, we now have the tooling to actually do that. In the end it actually turned out that the problem was even harder than I thought it would be, to really come up with something that’s very smooth on the user level and we needed among higher-kinded types also a lot of other interesting mechanisms.
So that took much longer to get where we wanted and in the meantime we found a couple of more pragmatic problems having to do with package scoping and things like that, that was something that we barely got feedback from the user community and says: "Well, look here, this potential trouble spot that it might resolve to a package I don’t want to resolve to and so on. So, we decided that since we made this big step forward we would essentially do an implementation where if we felt that we couldn’t really live we something going forward because it was just felt awkward then we would fix it in this release. Actually everybody in industry from Twitter, LinkedIn, everyone encouraged us to do that.
They said, "Please, we want this fixed and we know that you have to fix it now because every release going forward will be harder because of the code base, of course, will grow even more than it does." So, I’m not saying that it’s a smooth migration, but for many people it actually is a very smooth migration, it’s actually not very hard, but it’s true that some of the big code bases out there, like Twitter, are still running on 2.7. I hope they will migrate to 2.8 soon and, if we can help them, then we would also love to do that, of course.
Absolutely not. So the big challenge really was to have something very, very sealed on the user level, that you would never see one of these in an arrow message, or in a comment or things like that. We wanted to have something where really the user experience was that it’s something that is very solid that doesn’t need a lot of rocket science to understand. At the same time, the challenge was to avoid code duplication, which the previous framework had, because, I mean, there are many different collection types, and they, all, at some level, share the same behavior, but technically, it’s very difficult to achieve that with static types and essentially the higher kind of types machinery and the other light machinery is there so that we can’t do this seamless user interface and, at the same time, avoid code duplication.
So that turned out to be much, much harder than I imagined. But in fact, in the end result the Scala collection libraries are actually unique, nobody, no other language has a library like that, where you say you have a rich set of collection types, a rich set of collection operations, in particular bulk operations, that take function parameters like Map, Filter, Flat Map, these sort of things and you have the uniform return type principle which says that every operation can be applied on a collection and yields the value of the same collection as a result. So if you apply a map to a set, you get a set, if you apply a map to a sequence, you get a sequence.
So it turns actually out that the combination of these features no other language has, but I would argue it really makes for a smooth user experience, but no other language has it. Why? Because it’s actually very, very hard technically to achieve.
4. You talked about code duplication and that’s why you redesigned the collection library, but there’s also at the user level, I can do a map of a list and I get another kind of container. Can you tell us about this because it is really impressive. I’ve never found any other library that does this kind of thing, like starting with some container type and ending with another container.
The tricky bit was that, when I said that essentially applying a map to some container type always yields the same container type, I was only 95% true, because sometimes you can’t do that. So let me give you an example. Let’s say you have a string. A string for us is a kind of collection, namely a sequence of characters. It’s a special form of a sequence of characters. So you can apply a map from character to character to a string and you will get back another string. For instance you can apply the map of the two upper case function, which it’s a character to its upper case version and if you map it over a word, you will have the whole word in upper case.
So that’s ok. But what would happen if I applied a map with a function that does not go from character to character, but let’s say from character to integer. Then I can’t very well take a bunch of integers and put them back into a string. So what I would get then is a sequence of integers, essentially the next higher level up. So, in summary what you get is the best possible collection type that matches the type of your map function, more precisely the type of the element that you return. That machinery uses something that we have in Scala called "implicits," in particular there is a theory that we inject into the collections using predicates which we call "can build from," they are just implicit values, and it’s true that with these things, if you wanted to, at the same time, do an operation and convert it to something new, you could also use that, but, for a beginning user, I wouldn’t recommend this.
I’m not even sure I would recommend it for expert users because I believe it generally leads to a code that’s pretty difficult to understand. So, from my perspective, I think "can build from" should be better kept into the implementation of the library, but it’s true that if a library can use it, you could use it. I just wouldn’t recommend that you it unless you know very well what you are doing.
It’s very difficult to measure complexity and by a lot of objective measures, Scala does not have a lot of features. For instance, it has fewer key words than many other languages including C#, C++, and even Java, it has its grammar size, so if you look at the size of the context free grammar, that also is about the same size as Java 5, slightly smaller and certainly much, much smaller than a grammar like C#, C++, or things like that. So, by objective measures, it’s actually not that big. On the other hand, it’s true that there’s a lot in the language, in particular a lot of very fundamental concepts, which take a lot of time for some people to learn and to absorb, things like traits, something that is new in Scala, pattern matching, the whole functional aspects that you have of closures and functions as parameters is new for many people and so on.
These things tend to make, let’s say, you need to take some time and really absorb and profitably use all these things. But the good news is that you can start right away and do it gradually because, essentially, you can start programming in a very Java-like style and then, gradually, as you understand and know how to use a feature, you would probably integrate it into your code base. So, there’s good news and bad news. The good news is you can do it gradually; the bad news is, in the end, there’s quite a bit to learn. It’s not just Java with some trivial extensions.
Scala is, I think, a fusion of function and object-object programming and it goes further than any language in that and that was what I wanted to do from the beginning, to say object-oriented and functional programming are not at all two different things. They work very well together because in object-oriented programming you work on the components essentially and functional programming means that you minimize state and you use functions as components and the two work perfectly well together. But you have to throw away a couple of dogmas and common assumptions, for instance the common assumption is that objects are defined by behavior, identity and state and the state is often, in people’s head, they often think that state something that is mutable, that you can change.
And it’s true that you can write those objects and those Scala objects exist, but it’s not the recommended way. So what we really push much more is to have objects that are immutable, that have data, methods associated with them, but no mutation operations. And that means that essentially instead of calling a method that does something to the object and just returns with void or unit, you actually typically you call a method of an object and you construct something new: another object, or another bunch of objects in the result and that turns out to be quite a new style of programming for many people and which has many interesting aspects and it’s been sort of an interesting journey and exploration into this style of programming, of essentially objects returning objects as a fundamental paradigm. So for me it has been very interesting and I do not think we are quite at the end yet. We continue.
Exactly! So we tried in every way we could to identify function and object oriented. For instance imagine pattern matching, so functional language has algebraic data types, we don’t have algebraic data types because essentially they would violate the purity of the object model where you say all you have is classes or traits. So for us, essentially, the cases in a pattern match they are case classes, they are classes that have a case modifying ahead of them which just which means that you inherit some standard methods like equality and you can pattern match on them. It also means that pattern matching can be open, so you can add further cases later on and you have to explicitly seal you base class if you don’t want that.
If you want to say, "Well, I will just have a fixed number of cases", for instance to get better error diagnostics, which then tell you that a pattern match would be non-exhaustive, that can be done only if you have this fixed number of cases. So, in that sense, we really try to instantiate, see a feature on the functional sides, see a feature on the object oriented side and then ask ourselves, "Can we make that the same feature?" The same thing is for, let’s say closures. Closures are just objects in Scala and the class of closure is the class function and those classes, they are plain classes that can be sub-classed.
So that gives again a very interesting construction than for instance an array in Scala is a subclass of a function, then again it’s this idea to say, well, we pick a functional concept, functions, and we pick an object oriented concept, inheritance, and we emerge them, and we get something which is quite new and interesting. Well, now it’s no longer that new, but it’s still very interesting, I think.
8. Absolutely! And also looking at any type, or any class in the collection library, we see a big bunch of methods there, all kinds of methods, like filtering, like taking subsets of this, or zipping. What inspires these libraries? I mean as an issue and you expect to find 30 or more methods on each of the classes in the collection. What inspires these methods?
There are a lot of methods. I think overall if you could look at a typical sequence class as pouring more than a hundred methods on this class. The good news is that every class has exactly the same methods. So you don’t need to learn another set of methods for the next collection class. They all have exactly the same methods. So we grew them gradually. I think that originally most of the design came from Haskell, the Haskell list library, we took a lot of the method names like "take and drop", and "filter" and "foldLeft", no, foldLeft is foldL in Haskell, we call it "foldLeft", but most of the method names came from there initially and then it grew mostly by suggestions from the community and we took some of the suggestions and we tried to simplify others.
I believe it gives you an extremely powerful and very promising way of programming and I believe actually that’s the future of programming for the rest of us. I recently gave a 2-day Scala training for professional programmers in London and that was quite a revelation for me. So none of them was very strong in functional programming yet. I had to teach that. And at first I tried to teach them the first order way of functional programming, with pattern matching and recursion of these things and they had, overall, a very, very hard time. I said, "This isn’t starting too well, so, to the next step, high order functions and these sort of things" and I, sort of very hesitantly introduced them and they had no problem with it whatsoever: map, filter, partitioning, the solutions just flew out and I’m starting to understand why that is because I think that everyone, if you learn to program, it’s a long time back for me, not longer than for you, but a long time back for you as well I guess, even something like a loop is not simple.
When you first come across a loop it’s really quite something hard and recursion is, essentially, just a loop, in another thing, and recursion is not simple when you come across this. What’s simple is, if you have few steps. A loop has many different steps. You have to sort of wrap it around to say, "Well, I do the sequence of actions and I do it so many times and I have many different time steps in my head and that’s hard to keep track of." Whereas if you do this higher order collection operations and you have very few steps because every step is very powerful, what it does. So it’s essential just to say, "Well, I have this collection, I want to get there, so first I transform it like this, then I transform it like this and then I have what I have."
Overall, it’s actually a style which is rather simple and beautiful, so it’s very short, it’s very simple and the other great news is we can parallelize it. So these operations are typically parallelized very well because my operation goes over the whole collection so I have the implementations. There’s a lot of freedom about how to do that. I think this will be the predominant programming style to really make use of multi cores and GPUs and all forms of parallelism without requiring a lot of expertise in concurrency which most programmers don’t have and never will have. It’s too hard. I feel stupid every time I write a concurrent program with all the bugs that crop up there.
9. Do you think that people will pick up on these kind of implementation goals? Because people today, they use, form time to time a map function or a filter, but hey don’t try to construct their solution in a functional way or using this higher order methods. It’s just a combination of these methods. They are not there yet. Do you think that it will take a long time to get there, once you understand this power?
From the experience as I had at this seminar, I now think it will rather be much quicker than I initially thought. I think that once you do your first steps, people get the hang of it very quickly and do fantastic code in no time at all. So I’m actually very hopeful that this is essentially the future for a lot of programmers to do parallel code that it’s at the same time efficient, performant and reliable and safe. Because that’s the other great thing which we learned from functional programming a long time ago that if you statically typed higher order operations, then typically, a huge percentage of the errors you might make actually show up in the interface of some function and will turn out to be a type error in the end. So I guess we all have this revelation to say "Well I had this programming.
It was really hard to get it to the type checker". And it’s 500 lines or 5,000 lines or something like that and once the last type error is eliminated, I very tentatively click on run and it just runs. And I say, "What happened? I thought I would throw it to the debugger, but now it runs." That’s something where these collection operations give you a lot of safety and reliability really because of the types.
10. When you started implementing your code combining these methods, you needed to be more explicit about errors and to put them into interface because if you start using try, cache or run time exceptions, they don’t show. The combination won’t work because you will get an exception everywhere and you will not be sure where to handle the exception.
I guess the first approach to exceptions is avoid throwing them. So, whenever you can make sense of things, avoid throwing them. The second thing is for handling them there is a debate of whether you should declare your exceptions or not. In Scala, currently, you don’t declare the exceptions, so they are just thrown and it’s true that then it’s hard to check where they come from. We’re actually working on an annotation based pluggable type system for checking exceptions and other side effects and other effects.
This will work only if we get much more polymorphism into such an exception checker or effect checker than let’s say what they currently are in Java. Because in Java the problem is that I can only declare a set of fixed exceptions and it’s very hard to combine that with high order functions like Map, where Map throws everything that the function calls throws and that’s very hard to express.
I think Option has a perfectly good role and Option is used a lot in Scala programs where you say it could be either this or that. If you want to make that a part of the possible result, I think Option is the right way to do it. Absolutely! But exceptions still come up. So, in particular, if you have some assertion, let’s say I define that I have slice method on a collection and let’s say that my contract says that that slice should only return a legal index range in the collection, so what if the programmer doesn’t adhere to that?
I think there is nothing better than throwing an illegal argument exception because while somebody didn’t keep to the contract, there is no point in returning an Option result, especially if somebody doesn’t play by the rules then the only thing I can do is throw an exception.
Yes. Actually it turns out that contracts can be implemented purely as an embedded DSL and quite nicely actually in fact. We can do asserts, which we already have, we can also do preconditions, post-conditions, invariants and so on. So I’m going to have a talk at the run time verification symposium. That’s going to be at the end of this month where I’m going to present that and talk about that. That’s going to be in one of the future versions, but the nice thing is that it does not require a language change. We hope that on the .NET version of Scala which is under development I think it would make a lot of sense to map that then into essentially the bytecode contract language that they have on .NET, on Java and essentially it will just be a code and we will generate it right on the JVM.
But the point is we can do that without changing the language and that’s something rather neat and without having tricks like embedding in comments or things like that. It’s real Scala code that essentially does the contract checking.
13. A more in depth question: obviously, Scala got some inspiration from Haskell. Basically, it was type classes that are encoded using implicits and implicit parameters. So can you tell us a bit about this? Why did you use this kind of abstraction in opposition to just using inheritance and class hierarchy?
Well I always like fantastical and extremely cool language and when I was working in the ‘90s as a post doc in the group of Paul Hudak who is one of the main members of the Haskell Group. Haskell was developed in the 90s and Paul was one of the heads of the standards committee for Haskell.
He absolutely is. So I know Haskell very well and I always admired the elegance of the language. When we developed Scala, then, of course, for a lot of problems you sort of had Haskell in your background and the only cushion is, well, in Haskell. Haskell is a purely functional language which is not really compatible with the more standard object oriented-approach of the form you see on the JVM. So the question was how can we use the Haskell features? So I knew what I wanted, essentially what was in Haskell, but I wanted to have it in a different way.
For instance, for type classes - type classes would have been not an organic addition to Scala because Scala already had real classes. Now you have classes and subclasses and that would have been strange. The question is, if you have real classes and inheritance, type classes also have inheritance, because a type class can extend another type class, if you have that already then you ask yourself what’s the delta that’s missing to get type classes. And it turned out that delta was essentially this implicit passing of dictionaries in Haskell, and dictionaries, of course, in the object-oriented world are just objects, and dictionary methods are object methods.
So that led to the implicit design where we said, "Well, we, actually, want to get where Haskell is, but we want to spend the minimum in language foot print relative to what we already had." There are lots of other features like that. Another thing was the treatment of laziness. My point is, essentially, laziness is absolutely great when you need it, but I personally don’t want it as a default, I want strict as a default form because it’s easier to predict what resources your program takes and things like that. And, of course, in Scala it’s mandatory almost because you do have side effects and everybody knows that mixing laziness with side effects is a very tricky business.
But I saw from Haskell that essentially laziness can have incredibly powerful applications, in particular that you don’t need to worry about initialization order, that things just make everything lazy and that it will sort itself out automatically. So that is why we introduced these lazy values in Scala that essentially let you define a value, but the initialization will be done not immediately, but only the first time you need this thing, so classic laziness.
And we got monads. Yes.
That’s right. Yes. So that actually came because I used to work very closely with Phil Wadler on predecessors of Scala, namely the Pizza language and then the GJ language and essentially Phil then went and worked on XML for a while, in particular XQuery. I still had very close contact with him. When we started to work on Scala he convinced us to have a very light syntax, using For and these things. That is what we did in the end. It’s essentially an object-oriented version of Monads, meaning that essentially, the set of basic operations is a bit different but it’s equivalent. This kind of monad has a unit operation that essentially injects into the monad and a bind operation. And we have Map and Flat Map.
That means you don’t have anything given where you inject into the monad. You have the value that you have to do yourself. But then, essentially you can express it with Map and flat Map and a lot of interesting libraries do, so, it’s not just collections but Option is a monad, Either is a monad, parser combinators are a monad, so, the usual suspects are all monads.
That’s true. One thing that we actually did different and I think that it has worked out to a benefit is that in the monad translation, actually what we do, we have this for expression, which is roughly analogous to do notation for monads in Haskell and then we translate that into applications of Map, Flat Map and Filter, if you have conditions, but the translation is not typed at all. Essentially, it is a purely context-free translation, it just maps these things into these combinations and then the result you start type checking. So, it means that you can convert, you can use that even though the static type is not something that declares to be a monad.
Originally we did that because the first version of Scala didn’t have higher kinded types and to express a monad you need higher kinded types, so essentially we were forced to do that because we couldn’t express the concept of a monad in a language. But now I think it actually was a rather good idea because it gives you more flexibility. You sometimes have situations where you’re almost a monad, and you want to reuse the same notation, but, for instance, you are not as generic as a monad gives you. Let’s say you only have work on sets of ints, but not on generic sets. So, it couldn’t be a monad, because a monad is parametric, is a monad of x. But it still makes a lot of sense to have these operations in the more restricted sense. So, I think that worked well.
Exactly. You can mix monads there, essentially a "For" Expression. The first generator that you do determines the result and then, essentially, everything else can be an arbitrary monad or an arbitrary generator.
The motivation in the end was that it was on the conceptual level just so incredibly easy to do. The first version of Scala always had, you mentioned abstract types, so we had abstract types. Essentially that came from another unification where we said object-oriented programming languages have objects, functional programming languages have modules. So we want to identify an object with a module. What modules typically have is they have types as members, which objects do not have or objects didn’t used to have.
Essentially, in doing the unification we had to accept that objects would have types as members, including abstract types where we said, "We are not actually going to specify an implementation." Essentially what you could always do now, if you wanted to fix it, you had an abstract class that had type member T, abstract, and then, how do you specialize that? Well, you say it’s the class, and then you put some curlies behind that and then you say, "T=string" or "T= int "or T is a subtype of number. You can add constraints to what you do here. And you can do the same thing, actually, for values.
So you can have an abstract value in the class and then you can say the values of the type object and then in your refinement you say, "Well, the value in my specialized type actually is not just object, it’s a string." Right? So we always had that. And there was a restriction which said, "Well, if you do that, than you are allowed to mention in your refinement only things that existed in the class already. And the beauty was to get structural types, all we had to do was to drop this one restriction, to just say, "Now you can have a class and you can have further refinements and do not mention things that are already defined in the nominal class." So, it was just too tempting to do it because I could strike out one sentence in the language spec and get a very powerful typing feature.
Then, of course, on the implementation side was different. You had to work pretty hard to actually make this work well, but that’s another story. It was too tempting to simplify the spec and get more power in the language.
The first usual suspect is always tail recursion. Every functional language wants tail recursion, including Scala, so we want to have. It would be nice if the JVM had a tail jump instruction where you could call a tail method without actually growing the stack size. Failing that, I was in the talk by Rob Pike this morning on Go, and I really liked his idea of segmented stacks that can start very small and then grow as large as you need so that also would put the pressure away because, essentially, our pressure is that you want to have many threads, possibly, not too large a stack size per thread, but then you want to have deep recursions and that might blow your stack.
So we do have a static tail recursion optimization, but sometimes your methods are just not tail recursive and sometimes the static optimization does not kick in because it only handles direct recursion, but not indirect recursion, or recursion through, let’s say, a function parameter and things like that. So that’s number 1. Number 2 is something that has to do with the class file format, so in Scala we generate a lot of inner classes. Essentially, every closure will lead to an inner class being generated and in a highly functional program it’s not uncommon to have essentially one closure per line of code, so we sometimes see one in a class per line of code.
For running this thing it’s not so much of a problem because, essentially, in the end, you put these things in a jar, and fortunately, the hot spot VMs are pretty good at actually loading a huge number of classes out of jars, mostly because a lot of Java idioms rely on having lot of small classes now as well. But it’s still a pain to actually generate all these classes and then put them in a jar, so that increases compilation times, makes jars bigger than they need to be, so it would be nice to have a class file format where you could just nest in the classes. The third thing has to do with traits. Traits have an extremely good story for software evolution.
With interfaces, as in Java, you have a fundamental problem. Once you have pushed out an interface and it’s in the public, you will never be able to change it anymore, because adding stuff to your interface means you have to rewrite all the implementations of that interface and you might not have access to them all. In a big code base that’s generally impossible. It means your interface is always frozen. Traits, by contrast, are good because you can add new methods to a trait as long as you know how to implement them. And you can even add fields. So in that sense it’s a much better evolution story, but it comes at a price.
The price is that if you add stuff to a trait, then you have to recompile all the implementations of the trait. Not rewrite, but recompile still, essentially, because the recompiler has to install some forwarders and some glue code into the class that implements a trait. So, if the JVM would recognize traits as something basic by some mechanism, then we wouldn’t be forced to have this binary compatibility migration problem. I’ve seen with a lot of interest the proposal on defender methods by Brian Goetz which comes close but tantalizingly it’s not quite there, because it just falls a little bit short of what Scala needs. Maybe if we find a way to generalize it further, that would be cool.
If not then, what we do now is essentially a work on tools to do that automatically. If this recompilation of tried implementations, we can actually do that by binary rewriting whole jars. Some would look at a jar and say "Oh, a set of jars!" and we would say, "Actually, there’s something missing here, let me add the bytecodes to it, instead of having to go back to the source and compile it. There is a solution out there, but I would be even smoother if the VM could take care of that.
Right. That should be out by the end of the year, or January at the latest and the biggest addition, from my perspective, will be to make collections parallel, to able to use the power of multi cores for the collection. So that’s the next step. We actually have that in trunk already, but because we want to be careful for the moment with binary compatibility, to really have something, only bug fix releases, it’s implemented in the distribution, but the release will come out in three months or so.
Thanks! It was fun.
Can Scala Compile to native code?
it will run faster.I think it will be used more.
Martin Odersky on the Future of Scala
Re: Can Scala Compile to native code?
But one of Scala's biggest strengths is that you can use the huge selection of Java libraries that already exist. Calling Scala code from Java has some potential problems you have to work around, but calling Java code from Scala is easy. If Scala compiles to native code instead, all of the ability to use Java libraries goes away.
If you want a programming language with some of Scala's concepts that compiles to very fast native code, I think Haskell or OCaml would be good. But Scala explicitly borrows concepts from Haskell and OCaml. Those two languages are older than Scala, and I believe one of the things preventing them from being more popular is that they don't integrate easily with the huge existing Java (or C++) libraries and projects. Scala has its own innovations, but I really think its fundamental accomplishment is bringing Haskell into the Java universe in a way many developers can understand.