John Hughes on Why Functional Programming Matters!

1. I am Sadek Drobi I am glad to be here at QCon with John Hughes. John, can you introduce yourself a bit for people who don't know you?

Yes, my name is John Hughes I am a long time functional programming researcher, I was on the Haskell design committee, involved in the beginning of that language, I was working at Glasgow university at the time, I moved to Sweden, to Chalmers university eighteen years ago, about ten years ago I started working on testing tools and we came up with QuickCheck first and then Haskell which I have now re-implemented in Erlang and now I am spending half my time at a company that is mostly a commercial version of that. So at the moment I have my feet in two camps in the functional programming world and the automated testing world. It's a lot of fun.

2. Yes, quite complementary these two worlds I guess. So, John you are very well known for "Why functional programming matters" paper. So why does functional programming matter? I mean if we start by let's ignore because there's a big hipe on functional programming applied to concurrent programming, and multi-core. Putting this aside, why does functional programming matter?

I fell in love with functional programming languages, very very early; I have learned Lisp when I was seventeen I think. And in order to use it I wrote my own Lisp interpreter. I was very very excited about that kind of programming, what I liked about it was that it made it easy for me to solve problems. At the time that I wrote that paper then many people were pushing functional programming with arguments like there is no mutable state, there are no assignments to variables, and this means that you are going to be able to prove your programs correct. Well, that sounds very nice and of course it is true, but actually I never do prove my programs correct, and that wasn't what I was personally excited about. And I found those arguments a little bit unconvincing, you say "Here is my great new language, it can't do assignments". Well, who is going to buy that? So I did my PhD on implementation methods for functional programming. And after my PhD I never wanted to write another word for at least a year and so I didn't, I spent a year hacking functional programs and at the end of that year I learnt a lot about what the features of functional languages were important to me, what enabled me to solve problems easily. And there were two features that I found very very useful: one was first class functions and the other was lazy evaluation. So the theme of that paper was to say don't sell functional programming as programming without something, sell it as programming with these two very important features and here is how you use them. And the insight that I had obtained was that those two features give you new ways of modularizing and structuring programs and that was the real value so I wrote it down. And that's where the paper came from.

3. What are these components? Can you explain us what are these components for let's say the majority of object oriented programmers or procedural programmers? Why are they interesting to have high order functions and the other components?

Both high order functions and lazy evaluation give you ways of gluing programs together. If you have first class functions you pass a function to a parameter, then that's your plug-in to customize your piece of code inside another function as it were. That means that you can take two function bodies that are similar but different in the loop whatever and you can pass a function to call at that point. And that's you can abstract both of them out into a common component, so passing a function to customize behavior of another function is a way of gluing pieces of code together, lazy evaluation lets you write for example a producer of a lazy data structure and the consumer of that lazy data structure, let's keep things simple, let's imagine an infinite list, when you plug those together you are composing two pieces of software again and it's important that the producer only generates values as the consumer consumes them, lazy evaluation gives you that and it makes it practical to glue together two pieces of code, which communicate a lot of data, whether it might be impractical to run the producer first and produce million of your list, you have to store them somewhere and then you run the consumer, that would be very inefficient but with lazy evaluation then the two run more or less as co-routines then you get an efficient composition. So what the paper said was that here are two ways of gluing programs together that conventional programming languages don't give you.

Why is it important how you can glue code together? Because if you imagine solving a problem in a top down way, you start with the big problem, you de-compose it into smaller problems, you de-compose those into smaller problems, when I explain this to students I usually draw a big monster, that is the big problem and then I split it up into smaller monsters, and then split those up into smaller monsters and eventually the monsters are small enough that you can just directly solve the problems, and you get your small solutions which you then glue back together to solve the original large problem. Well if you got a new way of gluing solutions together, that gives you a new way of de-composing a problem into parts and so it makes this top down design work better. That was the message of the paper.

4. And can you give a concrete example like an implementation of one functional idea that gets de-composed into the generator and glue high order functions?

Yes I can give a couple of examples from the paper. So first one was numerical methods I took a numerical analysis course as a student and I forgot almost everything from it, but I did remember a very nice trick which I can explain to you. Lots of numerical methods involve generating sequence of approximations and then taking their limit and so if you write for example a Newton-Raphson implementation in FORTRAN or C or whatever then you have a little loop with a termination condition when your two approximations are close enough together in some sense so that might be that the absolute value of the difference is small enough, or that the ratio between them is close to one, but there are a variety of ways that you might express that termination condition then you have a loop body that is computing successive iterations.

Well with lazy evaluation, you can separate out the termination condition and the loop body. So you write one function that produces an infinite list of approximations. Of course, when you run it, you are not going to produce infinitely many, all that mean is that it can produce any number of approximations, there is no termination condition built in and then you write a separate function that takes the limit, which contains the termination condition. So then you split that numerical program into two parts, generate the infinite sequence of approximations and take the limit. And those parts are re-usable especially the limit taking part you can then write all kinds of other numerical algorithms to produce an infinite list of approximations, and then take the limit in the same way.

And the clever trick that I remembered from my numerical analysis course, was this: when you look at these sequences of approximations always however you constructed it you can think of it as each element is the right answer plus an errata and now if you look at two success of approximations, you can actually eliminate the errata well you can't eliminate it all together, if it's a linear errata what happens is you eliminate the linear part and you are left with a second order, or you can eliminate the second order error part and then you get a third order errata when you do this you can take a sequence of approximations and you can apply a function to it to generate another sequence of approximations that converges faster it's called Richardson method and I could implement that idea as one function and then construct numerical integration programs by one very naïve trapezium role integration function to generate the first sequence, and if I wanted a forth order method I just apply the Richardson optimization enough times and then take the limit. So that's an example of numerical algorithms that become very very easy to construct thanks to the ability to factor up each of these parts as a separate function.

5. You've said you had a few examples, can you give us more, because I guess people are interested in examples, this is one case of the example and in your paper you cited other examples for this composition model.

Yes, the other example that I gave there was an implementation of Alpha-beta search and this is a very nice application of lazy evaluation where you want a certain space and you'd like to separate the certain strategy which in this case was the Alpha-beta method from the generation of a certain space. So in the paper I talk knots and crosses. I had a very slow computer in those days and I needed a very easy problem, knots and crosses was so easy that even with a very bad functional language compiler that we had in those days, I could actually solve it. So you construct the search space as a kind of a search tree, and you use lazy evaluation to do that which means that even if the search space is far too large to represent on your computer, or even if it's an infinite space, it doesn't matter because it's only going to be constructed as the search strategy traverses it and then you write an appropriate search strategy. That allowed me to express the alpha beta method as a single function which could search any space of the appropriate type. And so I could apply that to the crosses game, I could apply that if I had faster computers to chess or whatever and actually I am still using the same method to this day, to program the search algorithms in QuickCheck to search minimal test cases, it's the same method.

6. And this composition you can apply on several levels, as many as you want, you can take any composed function and then you compose it into other logic that's the idea, right? Like all the system becomes compositional. So that is the core of functional programming, the main core, lazy evaluation and high order functions. But languages offered today have far more features than these two features. What do you think about all the other features and the necessity of all the other features that we have in today's functional programming languages?

Well, some of the features like list comprehension for example has very very nice implementation that make the kind of lazy Lisp programming that I was talking about very concise, very nice to express and when you see the same thing appearing in LINQ, in the Microsoft languages where indeed lazy evaluation used to express database search similarities using the same comprehension annotation so I think those are great new features, monads of course could be a tremendously important new concept and the syntax that has been added to Haskell to make them intuitive to use, it's a very minor thing, people say that syntax isn't important but it's not true, it really really is important and it makes that concept accessible even to first year programming students. Is that the kind of thing you had in mind?

7. Yes, but I guess you are aware of all the things, like everybody is talking, we hear often people talking about monads but no on seems to understand them the right way or maybe tutorials are way too complicated. Is there any simple easy way of explaining monads? Can you explain monads in a simple way without thinking out types and all the plumbing happening behind?

There are several levels on which you can understand monads and the most basic level is in Haskell, just to use it input output. And I teach first year programming students to do that they have never seen a programming language before but they are still able to learn to do input output. A part of that is learning to use the do syntax, but you know the learning notations you tell them to use the do notation, and they do it, there is no particular problem with that. The conceptual difficulty is that they have to understand the difference between an IO action producing a value and the value itself. And I usually explain it like this: think of the monadic value the IO action producing the value, as the instructions for producing something, is not the same as the thing itself. As an example if I give you a twenty pound note, that is not the same as giving you my credit card and instructions how to use my PIN to get twenty pounds from an ATM. What's the difference? What if I do the latter? You can choose not to do it at all, you can choose to do it as many times as you like, and it is clearly something quite different. So it's not hard if you explain that in this way for people to understand that there is a difference between an action and the value that it produces, and then you just explain that this difference is reflected in the Haskell type.

8. You think that IO is over simplified in other languages or maybe it was for practical reasons over simplified in main stream languages?

You mean because there is no difference in the type? Well maybe that would be a bit strong but in a language which is aiming to be a functional language there is a big advantage in having the side effects explicit in the type, so you know when you do them plus of course having an IO action as a separate type enables you to define control structures as functions and so that gives you a lot of power as well. So yes, I guess there are advantages to distinguishing the type another language as well but there is always a cost, you have to keep those two types apart in your head, and you have to convert between them at times in your code. It's probably not a one size fits all question.

9. You just talked about side effects. What kind of side effects are we talking about? You talked about IO, what other side effects that I should be aware of when I am programming? And why should I be aware of them?

If we think of side effects in Haskell then I got to re-interpret your question to say what kind of types of monads might there be? Monads don't always represent side effects and if you think of side effects in terms of mutable data or side effects in the file system, then IO covers a large category of those. Actually Haskell also has side effects on internal references inside the program, for example software transactional memory references. And those are a little bit different because updating a variable inside the program isn't an effect that is visible from outside. So those side effects can be encapsulated, in a way that IO side effects can't but you can use monads for all kind of other things as well; monads has a programming structuring tool so going back to why functional programming matters. And you can define reusable high order function and you can implement the monad API with them and that gives you a lot of power. For example in quick check then the random generation is done by a random generator monad there are no side effects going on in there, but it still provides a nice way to capture and hide the details of the random generation so to the programmer it looks like the same kind of programming that you do when you do IO programming. Other examples would be monads that capture continuations for example support code-routing, sky is the limit really.

10. So I understand the monad as something that takes out the same complexity that I see everywhere takes out of code that I write cleaner code?

Yes, any high order function is there to hide complexity and monads provide one high order function that will hide a particular kind of complexity, so in general when you introduce a monad into your code, then you are hiding some ugly plumbing which you previously or otherwise would have throughout the code, hiding it in a monad would improve the structure of your code in a nice way.

11. I am not actually willing to get into a debate between static and dynamic typing but your first explanation of functional programming didn't mention any typing, you were talking about functions and lazy evaluation what does "types" has to do inside that equation? Because we see a lot of functional programming like for example Haskell provides static typing, Ocaml does too and there seems to be good reasons for why they used static typing. Another good reason not to use static typing, I am interested in knowing what you think about that, since you program in Erlang and in Haskell and you seem to know about both of them.

Yes, so the type system in Haskell has been a very very active field of research for quite a long time some really exciting work has been done. And I guess it gives two main benefits: one benefit is of course static type checking and just like everybody else of course I make type errors and I like having them spotted like that by the type checker, I miss that when I write Erlang code. The other benefit in Haskell is the over loading that the class system gives which lets you conceal parameters of the type system helps there to clean up your code in a certain amount the parameter passing happens implicitly and for example in the Haskell quick check that we were able to provide a very very nice looking API where you don't have to specify how to generate data at all because it can be inferred from the types. That's an example simplifying code thanks to the type system. But I am not a type system nazi, some of the research in type systems aims to make the type systems more and more expressive so that more and more errors can be caught at compile time by the type checker. Of course that can be very useful but the other side of the coin is there are more restrictive type system we'll have more of an impact on the way that you write code and I think you have to keep that balance in mind so it's not necessarily an advantage to make the type system stronger and more expressive. An example of that would be generic programming in Haskell, there is lots and lots of papers about it and indeed you can still write papers on how you can do generic programming in Haskell. Generic programming in Erlang which I do a lot is four lines of code.

12. Is it practical or does it solve a part of the problem to write a type system that is not statically checked but still that is there and helps feeling that implicit parameters? Like dynamic at runtime type system

Sure, as I explained in the beginning I don't see functional programming as programming without something, I see it as programming with something and of the two features that I have talked about: high order functions and lazy evaluation Erlang has high order functions, and so all the techniques that use those you can apply in Erlang. It doesn't have lazy evaluation and I really miss that. But you can simulate it in some cases, by simulating a suspended computation as a function with zero arguments, and I do that all the time. So yes, moving from Haskell to Erlang most of the techniques that I am used to using in a way of thinking about my program just carries straight over. Then Erlang provides lots of other stuff as well like the concurrency features so I learned when I started programming Erlang the hard way just how difficult concurrent programming really is. And it's difficult in Erlang too God help me if I should try to do it in Java.

13. So, do you try to keep your code side effect free in Erlang? Or it matters less?

Absolutely, I have built a new version of Quick Check in Erlang and that uses random number generation and the Erlang random number library has a stateful interface, you call the random number, call that twice you get two different values. Not so surprising. It's very tempting when you have a library with side effects to build code on top of it with side effects and that's what I did to begin with and I have been programming in Haskell for so many years that to be honest I have forgot just how harmful side effects can be. Every single bug that has happened tearing my hair out in Quick Check has been caused by those very side effects with the result that I have now put a purely functional API on top and I don't tear my hair anymore. So yes, side effects I find really harmful, and I avoid them wherever possible in my Erlang code.

14. Haskell seems to push you towards not doing side effects because of the type system and all that the language offers. And in Erlang you did it with discipline not to do any side effects. Do you think that discipline is enough for keeping your code pure or side effects free?

Yes, sure, if you follow the discipline then your code will be side effect free.

15. Yes but you are a long term Haskell programmer and that's why you know everything and you know exactly what you should do to keep your code clean. But if all the programmers start with Erlang would good practices help to keep the code clean? Because one side effect can break everything anywhere.

Erlang does not have side effects on data structures, the data structures are immutable. I think that already forces much more of a functional style so it is not so easy to pick up Erlang and start programming as it is with an imperative language.

16. Are there any other functional programming languages that you find interesting other than Erlang and Haskell?

Sure, F Sharp for example seems to have a lot of nice features and it's very exciting to see the things that Don Syme is doing with it. If I think of the Scheme work, then the DrScheme environment and work that has been done on teaching that stuff is really exciting. So yes, Haskell and Erlang are the ones that I use and that is partially because life is too short even to use all the functional programming languages, but there is lot of interesting work in many of the others as well.

17. Like we intentionally tried not to talk about concurrency in the beginning so now Haskell is a side effect free language and it has its model of concurrency and Erlang has a different model of concurrency, can you compare these models? Can you compare your experience of doing concurrent or multi core programming with contrast these both languages concerning multi core programming?

Well you say Haskell is side effect free and that is a truth with modification in that of course there is a form of concurrency that is entirely side effect free where you just say "Compute this expression in parallel". That is very easy to use but Haskell also has modifiable references, software transactional memory, the M structure references and so on.

18. What I meant controllable side effects?

Yes, they are controllable, most of the concurrent programming that I have done has been in Erlang actually what can I say about it? You can screw up with message passing, you can make that locks, you can experience race conditions, but fortunately they are not at the same level as you can experience with mutable structures. So you don't get data races where you forgot to take a lock, or a compound structure you find two pieces updating different fields at the same time. Rather you suffer higher level concurrency problems like a server stops, it de-registers its name from the local name server and another invocation starts and it tries to register its name and those messages arrive in the wrong order and then you have to deal with that and so I guess you experience concurrency problems but not so enormously many of them that they are impossible to cope with.

19. So what about Haskell? Have you tried to do multi core programming with Haskell?

A little bit and I haven't spent a lot of time doing that I must admit but what we did find was that things didn't necessarily speed up as much as we had hoped and I think that was probably because the concurrent processes were sharing the same data structures, so we were getting a lot of cost in the cache coherence algorithm. One of the bad things about Erlang is that when you send a message it really is copied into the receiving processe's heap and so a lot of copying goes on. One of the good things about it is that once you have done it, there is not going to be any contention in the cashes for those values. I can't say very much about that though, you should ask Simon they know much more about this then I do.

20. You said you could run into race conditions in Erlang not at the same level as other languages, much less and that is why you wrote Quick Check right?

That is one of the reasons, what you are thinking there is the pulse randomizing scheduler that we wrote for Erlang so let's us interleave the execution of concurrent Erlang processes at the level of message sending and receiving in order to search for race conditions. And that has been quite successful at easing out some conditions and components that don't figure were written which were actually in use.

21. Of course you are writing with Erlang and all data structures in Erlang are immutable so you are quite safe regarding that but there are new programming languages like Scala and F# which have mutable data structures and they have functional programming features like high order functions, laziness and so on. Do you think it would be easy to mix these two parts like object orientation with mutable data and functional programming wit lazy evaluation?

Lazy evaluation in particular mixes very very poorly with implicit side effects. And the reason is for a start you don't know in what order your code will be executed, so if you imagine writing comparative code and then taking out the statements that you are going to execute, throw them in the air and shuffling them and execute them in a different order, the chances of your program working are very slim. Not only that but parts of your code might not be executed at all, so that is why using side effects together with laziness is extremely difficult. It's not to say we can't do it, in Haskell we have the un safe perform IO function and you can use that to do implicit side effects in conjunction with lazy evaluation in order to implement things like mom function where you need to implement things inside the implementation. But it's a very specialized job and you want to keep the code in which that happens as small and as simple as possible and make sure that it has a purely functional interface otherwise the lazy evaluation will ruin your day. So I think that in languages that combine mutable data structures and lazy evaluation then programs are going to have to learn to keep the two separate not to use mutability in the part of the laziness and if you read tutorials on this in links and C# for example they advise you don't do side effects in the parts that are going to be lazy evaluated.

22. And you think that transactional memory can help in that sense at least for mutation, for data mutation?

Transactional memory is a wonderful idea for concurrent programming but I don't think it's going to help with that the interface with lazy evaluation there the problem is that your side effects happen in an unpredictable order or not at all. And I don't think it has anything to do with transactional memory actually.

23. What I meant it was more about in a concurrent context like you have functions and functional programming and you have like in Haskell when you have a huge amount of data and you try to give it on several cores and then of course things can happen not in the right order that you want because data mutations are somewhere. And maybe transactional memory can help, because the basis of that concurrent model the multi core model in Erlang is actors, they have taken out all the complexity of copying data. In F Sharp you have mutations but maybe transactional memory could help for avoiding race conditions or complexity of data mutation in a concurrent way.

I think the jury is still out on that point, when you have data parallelism, if you are to do a data parallel operation with one transaction then it would be a very long transaction because you are presumably manipulating quite a lot of data and transactional memory works best when the transactions are short, since in the event of interference you have to role some of them back. So the question is if you are to use transactional memory together with data parallel libraries how large would your transactions be how often would you have to roll them back and would you get good performance as a result? Maybe not.

24. The last question: what are three preferred IT books that you would like to advise people to read?

I like Paul Hudak's "Introduction to Functional Programming" very much indeed that is a beautiful book that really conveys the Haskell philosophy you might say. Another book that I enjoy very much is "Crossing the Chasm" it's not an IT book it's a business book and it's like the bible of Silicon Valley start-ups, it talks about the technology adoption curve and in particular how you start off on this side with a few visionary idea and then you communicate to early adopters that it's a good idea to tray and then more and more people are starting to using it and you get into the early main stream and the late main stream and then down to the laggers which are basically never going to use this technology until somebody beats them with a stick. I liked that book very much not least because I think there is a misconceived idea perhaps among academics that if you have a new idea it should swipe the world immediately if you look at functional programming it hasn't done that but if you expect this technology curve which actually always happens then you can see where functional programming is on this, and I like it because it gives a very optimistic view of the future of this technology which I am very happy about having spent my working life working on it.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Bio

About the conference

This content is in the Microsoft topic

Related Topics:

Sponsored Content

Related Editorial

Related Sponsored Content

Popular across InfoQ