Bio Honorary Professor of Computer Science at the University of Glasgow, Simon Peyton Jones currently works at Microsoft Research in Cambridge. He has led several research projects focused on the implementation and applications of functional programming languages. He has greatly contributed to the design of the Haskell language, and is the lead designer of the Glasgow Haskell Compiler.
1. I am here with Simon Peyton Jones and we are going to talk about the programming language in general, Haskell in particular. Simon, why don't you tell us about yourself and what you've been up to lately?
I work for Microsoft Research Lab in Cambridge. I've been there for 10 years now. Before that I was an university professor in Computer Science. I moved to Microsoft for a change, so I feel like a person who is on sabbatical from University, only all the time. I get to do more or less what I like and what I like is studying functional programming languages because I am interested in making computers easier to program. At the moment, I think that functional languages offer at least one big way to make the ultimate computers easier to program. All that I do has to do with a language called Haskell. Haskell isn't the thing I study directly; it's the base on which I stand. We've got Haskell, the programming language, along with the colleagues in Marlow - we've built a compiler called GHC for Haskell - and lots of people use that. Now what I can do is to study some new programming language features, like transactional memory. We put it in GHC and engineer it well then, in the next release, lots of people start using it; we get lots of good feedback. So it is a really good kind of feedback for learning about how programming language features work.
2. The last Microsoft researches are trying to find a multi-paradigm language that actually is a functional programming language and is an object programming language, that has a mutation, like F# and Scala ;can you elaborate on that? Does it solve the kind of problem that we are facing today in the mainstream?
I suppose 2 different things can go on: one if you can start from where we are in the mainstream, which is with object oriented programming essentially, and you could try to add good things from functional programming to it often and concurrency to it and that moves you in a good direction. That is what's happening with F# and it's same, to some extent, with C# itself, which is going many functional languages inspired features. But somehow, what you have is always a compromise. For start, we are not taking anything away, we are just adding things.
That makes the whole enterprise more complicated - that is already pretty sophisticated to begin with - and also it can be easier to study something in isolation, so I see what's happening there as we were in the center gravity of the programming world and what people like me are doing. Let's take a principled approach: in particular for Haskell this means very serious about purity -the lack of side effects-, not because that is going to solve all problems now, but because it's worth some people studying that as it were under laboratory conditions. I gave an example in the talk I've just given about the cross-virtualization of what can take place between those approaches: I took transactional memory from the mainstream world, studied it in the Haskell world and in that we learnt new things that we sensed able to transplant back. I don't see these approaches as being in competition, one being better than the other, I see them as being complementary and heading towards the same ultimate destination.
3. For example, C# added some functional programming structures; do you think that the syntax matters? You program functional programming with C# syntax, don't you think it matters like it will make the programmers program with their old culture of imperative programming because they got the syntax, inside C#?
It's conventional to say syntax isn't really important in programming languages, you still get passed it, but I think that actually syntax is the use of interface of a programming language. So it does quite have long term and pervasive effects, I think. Of course, adding new features to a language like C#, which already has a pretty full syntactic space, means you might not do everything exactly as you might choose. Actually, I think it has worked out quite well - the syntax they have chosen for LinQ and for the lambda expressions. More than the syntax, perhaps the question would be"
To what extent is it convenient to write significant chunks of your program, rather than little fragments in a functional style?". There isn't any lambda expression in C# but has particularly to do with LinQ when you say "Filter this record by this predicate, by this boolean expression" and then it's very helpful. But these lambda expressions are rather small, they are typically one-liners, so to grow C# into a language in which you could write whole modules using only functions, it's not a very convenient medium to do that in.
That's more than just syntax, it's function application. If you take a function valued thing and a value even to do the apply- is more than putting them next to each other- you got to have an apply operator. Because it is doing many things at once, you can't blame C#, because it's doing lots of other things that a language like Haskell isn't, but if we try to do a lot at once than any particular thing it may be not quite as convenient. Whether that, in turn, color people's understanding or perception of functional programming because they will think "This is so clumsy", I hope not, I hope they will think "This is quite cool and maybe I should try it in a pure form as well".
4. In programming languages like C# or Java we compose objects, we have patterns for composing objects. In functional programming we compose functions. What do you think about that? The same thing is done in 2 paradigms: in an object you need to create a lot of objects and in function you just compose with monads and other stuff. What is your take on that?
I am not really an experienced object oriented programmer so I don't think I have that kind of visual big picture field for what large scale object composition feels like in object oriented language. I think you are talking about design patterns. Design patterns are somewhat informal descriptions about ways in which you can compose objects together. The informality is a strength because it gives them a quite broad applicability and it's also a weakness because it's not manifested as a library and it's not object; it's a sort of more what goes on in your head.
I've often thought about what corresponds to design patterns in a functional language, because there hasn't been a design pattern book for functional programming in the same way it has for object oriented programming and I wondered why. So I don't think I have a categorical answer. I think one way to answer might be that many ways of composing functions can be expressed as high order functions, so they are functions that take functions as arguments and glue them together. They are just more programs. In a way, we don't think of them as something different, they are just yet more functions, but actually they are functions whose role is to glue together other guys.
Let's see if we can take an example: map the function that applies a function to every element of a list to an array or a set is such a thing, because it takes a function as its argument. But in larger scale programming, you tend to get larger scale high order functions. I don't know if high order functions really encapsulate all what object oriented programmers think of when they think of design patterns. I'm still waiting for the day when somebody produces a functional programming book called "Functional design patterns".
5. Can you elaborate more about what you think of the implementation of list comprehensions in C#? It's not exactly a Language INtegrated Query, it's actually a language integrated monads, because it introduces monads inside C#
The clever thing about LinQ; what they could have done, is they could have just said "Take SQL and plop it into the language and we do just that one thing". But they were clever, they made a way to do Sequel-like queries over arbitrey iterators, so things that can produce a stream of values. And they also made it extensible so that you can add new iterators, you can plug in, and you can also elaborate the basic LinQ framework, because they give you access to these expression trees. To be honest, Erik is an expert about Link, but it is quite right to say that it was generalized from the specific instantiation of "Let's deal with Sequel queries". I believe the part of the development process was people liked Erik who knew all about monad saying "We know how to generalize this".
He would take this much more general framework that came from Haskell and similar languages, apply it in this C# context and then have the material effect on the level of generality they were able to do. I think that if they didn't have that input we might have seen something that was less flexible than what we have now. What's being implemented and what it's designed to be implemented is quite distinctively C#-ish. They have come with a nice design, and I don't understand all its ramifications without having used it in a serious way. But you can see direct deliniage and that's something they acknowledge these days, too.
6. LINQ was one of the concepts on the list comprehension; do you think that there are other monads or other abstractions that can be taken in Haskell and then be implemented in the mainstream, other than lists?
Partly there are matters of convenience. In functional programming languages it is very convenient to build data structures and do pattern matching over them and you can do it with something that is isomorphic to that, in an object oriented language, but is just not as convenient. It's not dialectical syntax that's discouraging; you have to write much more stuff. That discourages you from a particular kind of programming pattern. I don't know whether just lobbing in some kind of syntactic support for that would make the difference; maybe it would. Scala gets a lot closer, I think. So that's a syntactic pattern matching, a very early thing to be coming across in functional programming.
The burden I talked about this morning was about effects and purity so I think that, perhaps the biggest thing that will end up learning from a purely functional world and dragging somehow into the mainstream is the idea of constraining the effects that a subprogram or functional procedure can have. Maybe it's just to say it's completely pure, but you may also want to limit its effects - if you are involved in transactional memory- so that the procedure can only read and write transactional variables; it cannot perform IO and it cannot read and write non-transactional variables.
When you have those guarantees, if it's statically guaranteed, it makes it much easier to implement it the best way. I am guessing that in the future we'll see mainstream languages with some way of confining effects; I don't know exactly what that is going to look like but it seems there are too many reasons why that's important.
What you are referring to here is that, in Haskell, the idea of a monad usually shows up initially in people's programs in the form of input/output. We've also talked about a transactional monad but there is actually nothing built in about monads at all in Haskell; there is nothing in the compiler that says "I know this is a monad", except a little bit of syntactic sugar for "do" notation. But what that means is that programmers can define new monads of their own and you mentioned a couple of continuations passing; encapsulated state is another one.
Encapsulated state means sometimes that you have a computation like a sorting algorithm, which externally is a pure function. It takes -let's say- an array and it produces a sorted array. Internally, it may do lots of side-effects, but it's external interface is completely pure and you can't tell that it's implemented using side-effects. Now how could you encapsulate that so that its internal imperativeness was not visible outside and have that statically guaranteed? It turns out, using this monadic framework you can do that without any compiler extensions in a language like Haskell.
What I am saying is that, probably more important than any particular monad, is the fact that you can define new monads or monad transformers of your own. That means that -for any particular application program- you get to shape your monad if you want to use one to fit your program. That would be a facility that would be very nice to see in a language like C#, but I think it's speculating wildly to say "How could you take these elaborately abstracted things?". To have monads you need to have higher kind type variables, which means that in a generic language you can say "list of T"; T is a variable that stands for type like int or float or perhaps list of int. So that T stands for "type". We might also want the T to stand for a type constructor, so we have list of T and arrays of T, maybe you might have some M that might want to become a list or an array.
So M now is going to be something which takes a type and delivers a type, it's a type-to-type function, if you like, it's a variable that abstracts over that. So that's something of your .NET doesn't have at all. It's very important for the way that monads are generalized in Haskell. There is a place that I am not quite sure how it can be retrofitted without adding high kind type variables to .NET. Maybe they will do that sooner or later, who knows? But at some point you can't just keep shoveling new stuff into a language. There is a sort of clash of paradigms and you just get the dog's breakfast.
I don't know if we have reached that point with C# yet. The drift of your question is a lot: can we just keep lifting things up and dumping them in there? But at some point I guess it will become unsatisfactory to do so, and that's not a criticism of mainstream programming, it's just they were designed for different purpose.
8. We've got Lisp and Smalltalk and they've got really minimum syntax there; and we've got C++, C# as well, and they have lot of syntax, I mean where do you think Haskell is in between and what do you think of that? Does syntax add complexity or does it remove complexity for developers?
This is the use of interface issue again, isn't it? I think syntax is coming in 2 flavors: some of it is superficial and in Haskell I can define a function like this ƒ x=let y=x+1 in y * y, but I can also define the same function like this ƒ x= y * y where y=x+1. This is plainly just syntax, I am just using "where" instead of "let"; there is minor difference: "let" expression is going to occur anywhere, while "where" is going to occur attached to definitions, but nevertheless, it's pretty superficial difference.
We had a big argument when we first designed Haskell "We should have only one way to do everything", in fact we ended up with a language in which we provided multiple syntactic ways to achieve the same thing, because when you write programs, it turns out that sometimes "let" says it in the way you want to say it in that point and sometimes "where" says it more nicely. I think that's a superficial complexity because it's not really intellectually complicated for a programmer to understand what either of these programs mean. You are showing it, and that's it, you don't have to ask anymore.
Some things have much deeper complexity, like a list comprehension. In Haskell you say here is the list of all [y*y |y?yy, y > 3]. This kind of comprehension is a showing up in other languages like Python and Ruby as well. So here there is a bit more to explain about what this might mean. In fact, I've recently generalized this to be more SQL-like so that you can have group by and sort by clauses in this list comprehensions as well. But they are still somewhat superficial and then there is deeper stuff. An example of a deeper syntactic issue would be type classes. Haskell type classes are very different from object oriented classes, but nevertheless you could say they are just syntax class declarations, instance declarations but actually they are just the surface syntax for a whole collection of concepts that you have to understand. So I am trying to distinguish between superficial syntax and the deep stuff.
I am very relaxed about superficial syntax and, for the deep stuff, Haskell has this big concept of type classes because it makes lots of programs easier to write. But undoubtedly it's something you have to get used to if you want to write in a particular language- object oriented classes just the same.
Smalltalk and Lisp both have the property, but not only is their syntax very modest but they also get away with a great economy of concepts, since Smalltalk is just objects and messages and there are just S-expressions and lambdas which can capture free variables. They are extremely economical with the deep stuff, too, and they choose their deep stuff so that it's extremely powerful. But there is a tradeoff because the lambda calculus - Lisp- is very powerful, but to express anything in particular it may take you a fairly large number of parentheses. This is also called "lots of irritating superfluous parentheses".
So then you say "come on let's make convenient the things we want to do a lot" and that's the role of superficial complexity. I am actually a supporter of having fairly rich syntax. The thing that Lisp gives you, that is difficult when you have a rich syntax, is that the program itself in Lisp is a data structure that can be manipulated by Lisp programs, so it makes meta-programming very easy. Because Haskell has a rather rich syntax, which is the Microsoft word meaning "extremely complicated and hard to understand" -it's sort of code, so always be suspicious when you hear people saying "rich"-, so rich meaning a lot of constructs but I think the most different in superficial ways. Nevertheless, if you want to write a program that manipulates the Haskell program, then you have pretty big syntactic variations to cope with.
That's a tradeoff - it's harder to make because the meta-programming thing is very convenient for writing certain kinds of program, but it means all programmers in Lisp have to just write in this very small language. I guess that there is just a tradeoff. But in C++ for example, it's extremely complicated; it has lots of deep complexity. I don't think I understand all of C++ at all. Haskell has a fair amount too, to be fair, that has grown over time. I am suggesting "concentrate on the deep complexity not the superficial syntax".
9. The idea of a functional language kind of scares me. Basically, it seems that when you think of an object, it seems to map to the everyday world, so "person.hand.scratch(otherhand)", where you have the methods and the properties also within a given object. It seems there is a kind of change in paradigm and a whole change and thought process that comes along with accepting functional programming and programming functionally. Do you think that may be one of the reasons that object oriented programming languages are mainstream? Is there a way to bridge that gap between functional and object oriented?
I think that, undoubtedly, imperative programming languages generally and object-oriented ones in particular, have very good kind of mapping to an operational model. You can imagine these locations getting mutated and imagine how the program executes step by step. It's actually a pretty close look at what happens in the real machine underneath. We have a pretty good operational understanding of that. A lot of people use spreadsheet and they don't think of themselves as programmers at all; they think that it is modeling and you don't have to explain to them if you don't think it's a sequence of steps. I tried to say this in my talk earlier.
The idea of trying to treat computations as functions that process values, when it's presented in that functional programming kind of way, seems quite natural. If you are somebody who is an experienced object-oriented programmer and you try in a half a day to learn a language like Haskell and Lisp, the thought processes are indeed very different, a lot of the habits you are going on doing don't work very well. I spent a morning taking snowboard lessons and I am sort of intermediate skier, I can stay up on skies and get down on the steep slopes. I spent the most awful morning on a snowboard, with an instructor; I fell over all the time, because my instincts were just completely wrong. Every instinctive body motion I made, made me fall over. I gave it up, I'm back on skies.
There is something of that feeling in it, that initially seems not just awkward but positively exhausting. Why would you ever want to do this? Then you learn the language and I don't think we have much experience yet of what happens if you have learnt it earlier. More people are coming out of universities with some exposure to functional programming and I think that's a good thing. But I will say this: I think of repeated experience of people who've just made the experiment of saying "Well, I'll just try to put my free conceptions on one side, just give it a world, and just have a go, and stick it long enough to get a few of the 'aha' feelings". Those people often say "Now I'm going back to my Java programs and I write them in a different way.
I think about them differently. So I had to rewire my brain but some of that rewiring has benefited the rest of my life. Now why is that?" I think it's because what happens is you can do functional programming in Java; you tend to program with more immutable objects; you get given a string and instead of screwing with that string and changing it, you copy it to something else then you still have the old one available and there is less interactions in your program and you find yourself doing more of that. You might wonder whether "is it efficient enough?" Actually, these days it probably is: good garbage collectors, good storage allocators.
Everything I do as a compiler writer is aimed to make that allocation and stuff happen fast. Those functional programming habits can then go back into the imperative world for the reasons that I was trying to describe into my talk. I think the habit of writing with fewer, as if they were accidental, side effects, the ones that are incidental to the internals of a computation but not important to its external behavior, those incidental side effects - getting rid of those you can do without - changing your whole programming paradigm. But you can still do that in an imperative language and you end up with programs which have some of the benefits that I was trying to describe. The bottom line is, it's a brain rewiring experience and there is no getting away from this, it's not like switching from Java to C#, or even from that to C++.
10. Could you give us an idea of what is going to be coming down the line in a few a few months? What's interesting? What can you see on the horizon that you are going to be trying out on your Haskell test pad?
I know no more than you do actually. Microsoft is going to productize F# and even if I work for Microsoft, I learn most about what Microsoft is going to do from the press. As far as I am personally working on now, what I am excited about, one big thing I am working on is exactly this parallelism thing, in particular, quite successful with transactional memory. You can download GHC and you have transactional memory system that works very nicely on your multicore but it's not enough because transactional memory means you are spawning threads.
It's like you are forking processes and they communicate through shared memory, so that's under your control. It's only limited amount of parallelism and you can exploit that through 100 processes. It's tough to find enough of them to do. The place that I think you would get lots of processes busy is using data parallelism. Data parallelism - a very old idea - means do the same thing to lots of elements of a large aggregate, an array say, at the same time.
Divide your array up into chunks, one for each process, each process does it's chunk and they agree when they finished. That's data parallelism, very successful in practice: MPI, Google's map produce, high performance Fortran, that kind of stuff. But it's flat; what I mean is that you do the same thing to every element of an array but the thing you do is sequential. Not every algorithm maps nicely onto that paradigm. A more flexible one is: in data parallel do the same thing to every element of this array but the thing that you do can itself be a data parallel algorithm. That is working in data parallel across some other vector and the thing you are doing to each element of that may itself be data parallel.
Now you can see that the parallelism tree gets some forking at the top, and little forks and then it's a rather bushy complicated structure so that is a lot better for programmers but it's a lot worse for the implementation. It's hard to implement that well, because the things that are the leaves of this bushy tree might be little floating-point instructions on a floating point number. You don't want to spawn a thread for that, you don't want to spawn a hundred million threads for those leaves. You are going to get some way of recovering granularity flat data parallelism, even though there may be hundred million elements in the array, if you have 10 processes, 10 million elements on each processer: sequential loop -very efficient mechanism.
Guy Blelloch, in the 90s, Carnegie Mellon and his colleague Sabah showed how to take a nested data parallel algorithm that are more flexible things that I described for you and transform it at compile time into a new algorithm, where it works on a flat data array. They transform the nested data parallelism into flat data parallelism and that you know how to implement efficiently. This would be really good because you get a much more flexible programming mechanism, which still has an efficient implementation. Guy's work in the mid 90s was in a language called Nastel, it was a prototype implementation; it's essentially an interpreter.
It was practically untyped to a very few data types and it really didn't do anything else except this nested data parallelism. What I am excited about at the moment, is taking the Nastel ideas and transporting them into the 21st century by applying them to Haskell. So I added it to GHC - a compiler of Haskell - along with some guys in Australia, Sydney. The idea is to have a single language that will support fork kind of expressive parallelism transactional memory and this nested data parallelism stuff and semi-implicit little spawny parallelism, all in one language framework.
Maybe you do different kinds of parallelism in different parts of your application and some parts of your application you are driving you gooey so you need to have that connection as well. I think this nested data parallel idea is the most promising way that I know of making use of all those lovely multicores. Interestingly, this transformation is a very big transformation, it transforms all the data: data representations change, code changes, everything changes. It's a systematic transformation of a source text of your program and the only way you have a chance of doing it is because the program is pure.
If it was imperative we just wouldn't have a firstname.lastname@example.org It's like a Sequel query optimizer, it shakes the program about a lot. And that shaking about couldn't work if it was also having to keep side effects straight. I am thinking maybe this will be the killer app, if we really get nested data parallelism to work. There is a certain risk here, I don't know I can do it, but we could get it to work with warp clock speed up on interesting problems and go faster than Fortran because we can then write programs that you couldn't get to work in Fortran and have them work on hundred process arrays -that would be cool!
Later this year we'll have a prototype. Actually, we are in the stage of looking for guinea pigs, so later this year we are looking for friendly guinea pigs, people who have problems which appear to have quite a lot of parallelism potentially available of a data-ish kind, do the same thing to all these guys, but which it's not sort of embarrassingly parallel at the top because that you might as well just easily use an MPI for. So we are looking for friendly guinea pigs who are willing to give a go to a new piece of technology that initially won't work at all, because it could be so good if it works well.
12. Recently I have crossed a paper of Meijer and others. It was talking about programming, Banana, Lenses and Barbed Wire. I don't know if you've crossed this paper, and it's kind of abstraction of several kinds of recursion, as far as I understand. Do you think such kinds of abstractions, if we do abstraction of recursion, give a lot of benefits for optimizing or simplifying the complexity of code?
I am trying to think how to answer that without assuming that everyone has read Bananas, Lenses and Barbed Wire, which is a tough paper. It was written before Erik joined Microsoft, before he went to the dark side. Erik sort of moved from being very pure abstract functional programmer to saying "No, this is all crap! We have to do this instead of that .NET stuff" and now he is swinging back again and actually is at .NET just at this moment. Somehow typed functional programming languages seem to allow people to express abstractions very well. What do I mean by an abstraction? Just a procedure, a function, a method is an abstraction because inside itself it hides details about implementation behind an interface that you might hope would relatively simple.
Those interfaces are often typed. To get flexible interfaces you need quite rich types, so the functional programming language is the place where particularly rich typed systems, particularly to do with polymorphism -this is the generic stuff- have grown up. The sort of structures that Erik was talking about has turned out to be quite easy to take abstract mathematical concepts and map them into programs that you can actually execute.
I don't think I can make that concrete in a setting like this; indeed, I would say it's one level back from practical programming because people are taking fairly abstract ideas, mapping them into something that you can execute and saying "Oh, look! Here is a programming pattern" - this goes back to what we were saying about patterns -.What you see in that is a programming pattern; we might be able to generalize it and use it for real programs, so this is a 2 step process. It has been like this what happened to monads: abstract ideas in monads expressed as something that you could use and now we've be able to generalize it in all sorts of ways. Somehow, the combination of polymorphism in the typed system - of a rich polymorphism in the typed system - and the purely functional nature fairly direct transliteration of ideas from mathematics into the language and that has seemed to give more practical consequences then you would reasonably suppose.
13. Back to F#, it has got a lot of concepts of functional programming language, yet a mutable state. Don't you think that a little bit of mutation can break the whole system, the whole concept with the programming language?
That's right. .NET to F# is a compromise. It says the F# the high order bed of .NET of F#'s design it's a .NET language. In a .NET language you can call any .NET procedure and any .NET procedure can do anything. It's really hard to say this is a pure function. F# instead lures you into the pure world, by providing a rich well-engineered language for writing purely functional programs. Actually this goes back to syntax because F# translates into MSIL - the intermediate language - as C# does, so anything you can do in F# you can do in C#, too. You can write those functional programs in C# but it's much less convenient.
So F# makes that convenient; it makes it easier to write the functional parts, possible to do the imperative parts, but, as you were rightly pointing out, you don't get guarantees. There are some places, like parallel LinQ queries, where the guarantees have to be there and the only way you get them is by convention. But that's not so bad. We are used to that - think of locks protecting mutable data. Mostly, the connection between a lock and the data it protects is not explicit in the language, in the program, but it's in your mind, usually.
The connection between a lock and the data can sometimes be a little more intimate, but often which locks you must hold before accessing which data is a quite implicit bit of the program. There aren't any guarantees there, we just establish conventions and try to work with them. It's not as satisfactory as having a rock solid guarantee. We want stronger guarantees, nevertheless, we shouldn't say "Because you don't get a guarantee, it's useless". It's just a compromise, and perhaps a good one.
That's ok. Sharing is fine; references are fine. If you and I point to the same string, the fact that the string says "Hello", we don't need to have 2 copies of "Hello", we can share the same copy, so long as we don't change it. It's not sharing that's the problem, it's effects. If you share references to values that are immutable, everything is cool. If you share references to mutable values and you mutate them, everything is very uncool.
Sure. The purity has some good performance effects, because you can share much more vigorously - we saw that in the endshell example where 2 NSD C++ implementations of sets when it took a union had to copy the input sets, in case they were subsequently mutated, the F# implementation did not. But it also has its cost. Supposing I take 2 strings and I want to return to you "Hello" and "world", I want to return the concatenated string "Hello world". I can't mutate either of the input strings because they are immutable, so I have to make a new string - the "Hello world" string - by copying the other 2 or copying at least one of them and sharing the tail.
There is some copying involved and sometimes you get more sharing performances, sometimes it's worse. I suppose then it tends to end up as part of your mindset, as part of the brain rewiring. You try to distinguish between data that you are going to mutate and data that you are not so that you don't go rigorously copy things. You can't say either that it's a definite win or that it's a definite lose, from performance point of view; it varies.