Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Jessica Kerr on Scala, scalaz, scalaz-stream, Testing with ScalaCheck

Jessica Kerr on Scala, scalaz, scalaz-stream, Testing with ScalaCheck


1. We are here at CodeMesh 2013 in London and I’m sitting here with Jessica Kerr. Jessica who are you?

Who am I? Well I live in Saint Louis, Missouri and I’ve been programming for about 15 years, mostly in Java but the last year I’ve been doing Scala professionally. That has been a lot of fun and as a speaker at conferences I have a mission of bringing just a little bit of functional programming to developers everywhere, because you don’t have to move to a functional language to benefit from the style and ideas of functional programming.


2. So how would you bring functional programming to the Java crowd?

I advocate for some of the principles in functional programming, immutable state is a big one that you can do in Java, it’s a lot more work but you can totally do it and then could benefit from knowing where your state came from, where your variables were initialized. Another one is the idea of writing functions that are data-in, data-out. So a functional programmer will call this a pure function or a referentially transparent function, but I think those terms are confusing. I like to say data-in, data-out because this kind of function takes some data-in, produces a return value but it does nothing else, it doesn’t affect the rest of the world, it doesn’t read or write any global state and doesn’t modify its input, so you can trust this function, you can call it as many times as you want for the same input, you are always going to get the same output, it’s much more testable and it’s easier to predict what is going to do easier to think about.

Werner: You like to program in Scala, so is Scala something that is ready for the general purpose JVM population or where is it in the adoption cycle.

Is it ready? Yes, I mean the good and bad of Scala is that you can write Scala in a dozen different styles, you can write Java in Scala and it will be Java with less boilerplate, so there is a win right there, of course it compiles to Java bytecode anyway. So it’s highly integrable, but then you can progress from the Java style toward a more functional style or even a more Scala style, because Scala is even more object oriented than Java, so if you really like OO, Scala it’s a great language and if you really like functional you can work that in there too. So it’s kind of unique that way but it’s definitely designed to be easy to transition from, the one piece of advice I would give is something our team at work has benefited from, when our team first switched from Java to Scala, the Scala looked a lot like Java and they brought it in a couple experienced Scala programmers and working with them has helped the rest of us be more flexible in our style and include some more functional elements and to make use of the type system in ways that Java doesn’t support. So we’ve had a lot of benefit from that.


3. I'm curious to hear that Scala is more OOP than Java, can you expand on that?

Sure, so in Java you have things like primitives and you have things like methods and in Scala everything is an object, there are no operators except method call, so every operator is a method call, every primitive even, integers, everything is an object, a function is an object, and the type hierarchy is much richer you can be a lot more specific about your types, you have multiple Inheritance with some caveats, you can do a lot more OO things in Scala than you can in Java.


4. So the multiple Inheritance is, I think, traits. Traits are like interfaces with implementations, right?

They have implementation, they can declare fields, they can pretty much do everything except get parameters on construction, so you can have as many traits as you want mixed in with your classes.


5. Are traits widely used in Scala, is basically everything made from traits, is that how it works?

I wouldn’t say everything, they are definitely widely used and are very useful for a lot of different patterns.


6. What kind of patterns are implemented with it, like Iteration, enumeration and stuff like that ?

Well there is something called the Cake Pattern which is really interesting where you can plug layers of implementation together. There is an enum in Scala, nobody uses it, we just do it with a sealed trait, which we can then extend in as many classes as we need to. They are really quite versatile because they can have type parameters so we can use them to be more specific about typing.


7. Here is a question about, since you’ve mention traits, so in Scala how do you chose a solution, how do you choose “I’m going to use a lot of traits, or I’m going to use lots of functions”? How do you chose that, I mean the design space seems huge.

It is huge and that is a really good question and you know the best practices of Scala are not established yet, there are many different styles and you can go the purely functional style and do everything in scalaz or you can do purely OO, you can mix the two. There is some things that I think you can only do in Scala. So really that is not established yet and each team and each person is really going to have their own style and the fun part of that is when you come up with a pattern that is useful, blog it, tweet it, people will comment on it, there is open discussion in development and learning going on in the language, that is fun.


8. How do these different styles, do they interoperate well or can they clash?

They can clash, in particular I find the combination of OO and polymorphism and method overloading in combination with functions being passed around and the attempts to have very specific types, makes for some interesting properties of the type system, so it can be confusing, there are certainly quite a bit of things about the type system in Scala that are confusing but frankly you don’t have to use those yet, and one thing that is really beautiful about Scala, this is a good example, is the way you can take it to different levels. When you write libraries in Scala you can use a lot more of the language features especially in the type system, that might be hard to understand but are fairly easy to use. For example, there is a thing called covariance and contravariance, the little pluses and minuses in front of type parameters which I found really confusing at first, but basically covariance is the easy one, it’s the plus and list is covariant in its type parameter and that means that a list of dogs is a list of animals. Now in Java you can sort of do the same thing but you have to declare your list of animals as a list of , so every time you declare a list you have to think about: “Should subtypes of animal be valid as a kind of, as a type parameter to this list variable”, you have to think about to its use. In Scala you put that little plus in front of the type parameter when you write the library and you’ve said: “For this type it makes sense that a list of dogs is a list of animals” and so a lot of these decisions are pushed on the library designer instead of the library user, I like that about it.

Werner: So you can have the hardcore Scala people use the really fancy features and have the beginners of slowly advance and still benefit from them. Talking about hardcore Scala libraries, let's go on to scalaz. What is it? I hear it’s scary!

Yes, you know I’m still afraid of quite a bit of it when it gets into the Kleisli etc, there are words in there that I don’t understand but that doesn’t stop me from using the pieces of it that I do know how to use. So scalaz is like a fully functional side of Scala, it makes heavy use of Implicits, basically the Scala compiler has a magic hat and whenever you declare an Implicit function or value, Scala puts that in its magic hat and then anywhere else that that is in scope, your function can ask for an Implicit so-and-so of a particular type, and Scala, the compiler will go digging around in its magic hat and find it and supply it, so you don’t have to supply it every time you call your function and that is useful but it’s also really confusing because when you don’t have that thing and you get the error message of no Implicit found for, you are like what the heck am I suppose to do? One of the barriers to scalaz is it’s supplying a bunch of things but once you know those things are and how you get them in, then some of the things that scalaz puts in the magic hat are Monads and Functors and Monoids and the negative is all of these words are bizarre and foreign, to me anyway I don’t know category theory.

The positive is you can learn one and use one and ignore the rest of the library and you are fine, so for instance Monoids are the gateway drug to scalaz in my opinion. A Monoid is basically a little packaged reduce operation, it has a zero and a plus effectively, so when you have a Monoid that means you can put two of the same type of thing together to make the same type of thing, so the list Monoid concatenates two lists and it’s zero is the empty list. The String Monoid concatenates Strings and it’s zero is the empty String, the integer Monoid adds two integers and the zero is zero, and once you’ve abstracted this idea of putting things together you can use them for fun things like reducing lists and the really interesting part is that they compose, so scalaz puts into the magic hat the Monoids for List and Map and Sequence, Map is really a very useful one, and String and all of the ordinary types, those are in the magic hat, so if you have a type that is built up of those that is a case class, maybe case class you have to be a little bit careful, but in more general cases if you have a sequence of Maps, of Strings to integers, because scalaz provides Monoids for each of those little bits and because Monoids compose and find each other in the magic hat, you can just say: “Ok, reduce this thing in the way that make sense” and you can get a sequence of Maps of Strings to integers and put them together and it just does things right.

So I found those very useful, any time I want to put two big objects together I can define just little pieces like from my case class I can say: “Yes, put these two Strings together and these two ints together” but all of the deeper structure will be found in the magic hat and if I nest custom classes I can write a short Monoid for each of them that says how that makes sense to put two of these together and they’ll find each other, so I can put the addition even when I have my class within Maps, within Sequences, within another class, all of this structure the Monoids just find each other and then I can take two complete reports and just combine them into one report. And I can customize bits of that if I want instead of using the default Implicits but usually the defaults do things right. So that is one example, and Monoids are a tiny piece of scalaz and there is a whole lot more and you don’t have to use any more than makes sense to you, so it’s kind of like Scala in that way you can use it in the small and then grow to use more and more of it.


9. You talk about these Monoids and how they match and put things together, so how do you find out, how the magic happens, how do you find which buttons to press essentially?

I google it.

Werner: That's one way of program synthesis, somebody has already written it.

You can google it and find blog posts and articles specifically on Monads and Monoids and scalaz and then just use that one piece. There is a fair amount of conversation on Scala and scalaz.


10. What other utilities do you use from scalaz, what else is in there?

This is a good one, in scalaz.concurrent there is Future and Task that I like a lot, so the scalaz Future is very similar to the Scala Future except the scalaz Future does trampolining and I think that will be incorporated into the Scala one at some point, I’ve heard that is going to happen but I don’t know whether it has or when, but trampolining is a really interesting aspect that scalaz provides and what that is, so functional programming does a lot of passing functions around and nesting functions within functions and making recursive calls and the functional programming can wind up with a really big stack and the JVM is not built for that. The JVM is built to have an expandable heap but a fixed stack, so to overcome that limitation Runar and others in scalaz have built trampolines and that changes all these function calls into objects on the heap. So one of the beauties of Futures generally is that you can compose them, you can say: “At some point I want you to fetch this value” and then you say: “After that do this to it, and after that do this to it” all asynchronously if you like and then you put these two together and do something with both of that, you can string them together like that and because the scalaz Futures are trampolined and those nested function calls are moved to objects on the heap, you can go as deep as you need to.


11. It’s essentially a kind of interpreter that calls out to these individual bits and ties them together, is that one way to seeing it?

That is true, totally, the running of the scalaz Future winds up being an interpretation of these little objects that are on the heap. I’m finding a lot of that in functional programming as I get deeper into it, there is a lot of “Tell me what you want to do” and then separately there will be a little interpreter, whether you write it or it’s part of the library, that says how to do the operations you’ve specified and then separately from that you’ll determine when to do them. Often that comes down to running a scalaz Task which is another aspect we use a lot, the scalaz Concurrent Task wraps a Future except that it always contains an Either, an Either holds an error, an exception or an answer. So a Task can always hold an error or an answer and the special power of Task is that whenever it runs any operation, it is going to catch the exception, wrap it up nicely for you and you can get it back.


12. That is definitely an interesting solution to the problem of deep recursion and I liked your example from your talk, where basically had a long workflow built out of these Tasks?

So scalaz-stream is a module built on top of scalaz that makes use of Tasks in particular for their error handling and for their ability to compose, you can cram more Tasks into the same Task using flatMap and the amazing thing about scalaz-stream is it abstracts computations as something called a Process which is its own little thing, and the Process can be a source of values that can provide values maybe it contains a Task that goes to the database, a task that reads the Twitter stream. But a process can also be something that transforms those values and it can also be something that takes two sources of values and puts them together either with a fixed order of pulling from one and then the other, then the other, or it could be completely nondeterministic, I’ll pull from whichever one has values for me right now, and all of these Processes link together and the beauty of that is that at some point you are going to have to define the source that gets the real data and at some point you are going to have to define... scalaz-stream calls it a channel but it’s another kind of Process that maybe writes to the database and in some way impacts the outside world.

But you don’t have to start from the beginning, you can start coding in the middle at the transformation that is most interesting to you or the real decision point and then you kind of move backwards and build up what you need to gather all your input and turn it into the format that best facilitates your business decision, and then you can build forward when you like it toward the affecting the rest of the world section. So you can build your program in any order that you want and then separately specify the order in which things should happen, what can be nondeterministic, what needs to be sequential and I think that is part of the power of functional programming, is it frees us from the timeline. In real life, time only moves forward, but in functional programming we can go backwards and every which way because we only define things in terms of other things, that is that data-in, data-out function, define what you want in terms of what you need and then you can link in what you need later.


13. So it just occurred to me that these streams are a way to deal with the asynchronous problem that is out there, basically rather having a callback hell, just plug together these workflows, is that a way how you solve this?

Absolutely, the streams are definitely an alternative to continuation passing to callbacks to a bunch of nested calls because each piece is completely independent, you can test each piece and then you can plug it right in the middle. What it reminds me of most is Unix command line utilities in Unix or Linux or Mac, I can cat a file to read its lines and pipe those lines to grep to find the ones that I’m interested in and pipe those to cut to take out the pieces that I care about and pipe that to a little program that I wrote that rearranges the line order or whatever. There is these little individual programs that know nothing about each other, their commonality is that they read from stdin and write stuff to stdout and the magic pipe operator links the stdout to the stdin of the next.

That is composition and I would like to clarify that is functional composition and object oriented composition is a completely different thing, so in OO composition you take two objects of different types and you put them together in an object of a third type, so you’ve got layers of objects and objects and in each layer the objects are different. In functional programming, composition, there is a lot of different kinds, but at some level they all get down to taking two of the same thing and putting them together to make another of the same thing, so your abstraction in scalaz-stream is the Process but whatever it is you are working with whether it is a functor or one of these Unix programs are just a function, so you take one of these Unix programs, you pipe it to another one and you have a program that reads from stdin and writes to stdout. You’ve taken two things, put them together and got another one of the same thing and that one happens to be much richer in functionality but it’s not anymore complicated. So as you compose objects in a functional style what you get out is not anymore complicated than any individual piece you started with.


14. I think you mentioned Iteratees and Enumeratees, how do they fit into this whole picture?

So Iteratees are interesting because there are two styles of iteration, of going through a sequence or a collection or a file or generally some series of data, there are two ways to iterate through that and do something about each piece. The traditional Java style is external iteration, is basically, there is some syntactic sugar around it, but in the end you are getting an iterator, you call next, next, next, you get a different value back each time. The user of the collection controls the flow of the iteration, controls when the next value is extracted or whether the next value is extracted at all, I like to say this is like owning your own washer and dryer, you have control over your laundry cycle. You can do the laundry and when your favorite jeans come out, well you can just stop doing laundry because that is what you really care about.

The negative is you have to remember to change the lint filter, just as if you are working with lines in a file somebody has got to close that file and so it’s your responsibility to either call close on a collection or in some way say: “I’m done, you can close the file now”. That is one of our challenges, is making sure to clean up after ourselves and be resource safe. External iteration has the benefit that you can stop whenever you want so it’s efficient and you can do something different with each item that you get back if you want to. The negative is resource safety, you have to clean up after yourself. The alternative traditionally is internal iteration and this is like, in Ruby when you have an enumerable you call each on it and then you pass that a function and internally the collection, the array or whatever it is, uses your function on each value inside it. So the positive is if the collection represents lines in a file it will call your function on each one and close the file at the end. So you’ve got resource safety, what you don’t have is an ability to stop in the middle and say: “Yes, I’m done”, it's like taking your clothes to the dry cleaner, you don’t have to do the work of washing them but you don’t get them back until they are all done, so you lose some efficiency there.

Iteratees are the best of both worlds in the sense that you can get the “I chose when to stop because I know what I’m doing and I know which of my clothes I care about”, and you can also get the resource safety of a professional is changing the lint filter and the way you do that is you have a conversation between the collection and the user of the collection. So instead of passing in a simple function from value to value, you pass in a function that accepts a message that might be: “Hey I got a value for you” or it might be: “You’ve reached the end”. And your function responds by telling the collection what to do about that, it might be: “Ok, keep iterating and pass the next value to this other function”. You always have the opportunity to pass a different function for the next value or you might say: “Never mind, I’m done, we are done here”. And when it becomes a conversation then you have the flexibility, the collections still has the control to make sure whenever either we're out of values or you say we are done then it’s going to close the file. This is like having a maid who is doing your laundry for you and when your favorite jeans show up on your bed you say: “You can forget about the laundry now and make my dinner”.

Werner: It’s always good to have a maid.

I would think so, that sounds good to me. So Iteratees are really cool that way but just today José Valim and I were talking about Iteratees and he is working on Elixir and how can he in one simple way support reductions and maps and filters and other operations on collections in a lazy style so that it could work over a cursor from a database or a file or something, but also support zipping two collections together because that is easy in external iteration, impossible in internal iteration and difficult with the Iteratees. But then we get back to scalaz-stream and in the end I really think scalaz-steams it declares as one of its goals to replace Iteratees and I think it really has taken the Iteratees idea to the next level, because what you can't do with Iteratees is if you design one that is a map and you design one that is a filter, you can’t say without providing the initial collection, put this map and filter together and give me one Iteratee, that functional composition of, I have a Iteratee that does this and one that does this other thing, I want to put them together and have one Iteratee that I can then apply to any collection that I want. And they don’t mix really well, you need to go all the way through this collection and then do the next one and that is not lazy, it's not scalable because if your collection is prohibitively large, then you don’t want to go through things multiple times and hold them all in memory. But scalaz-stream does let you define a Process, and this Process may be a filter and this one might be a reduce and this one might be a map and you can totally independently plug those together and then later apply them to a source of input, very powerful.

Werner: It's definitely an interesting approach, seems to be, these concepts seems to go through a lot of iterations, so to speak, to design and the community exploring them.

True, this is not a solved problem with one solution people are still exploring this which is fun.


15. We have something to look forward to. So finally, testing, you have some ideas about testing or concepts for testing. Do you like testing?

I think testing is very important, I'm not going to pretend that I really like it, I like tested code. You know the core of science and rationality is asking “What do I know and how do I know it?”, and we can provide code and know that it works, but how do we know that? Tests are all about answering “How do we know that?”. Types also are another way of saying: “I’m sure this program works like I think it does because otherwise it wouldn’t compile” as scalaz-stream employs types to make sure you are fitting your pieces of your pipe together correctly, but tests, so when I was working with scalaz-stream recently, I made sure to do tests in the same style that scalaz uses. Ok, using the same library that scalaz uses, I’m not going to say that they will be scalaz test but scalaz and the Scala compiler itself are both tested using ScalaCheck which is a great library that I really like, it’s pretty much a port of QuickCheck from Haskell and this is one of the ideas that I think not just the Scala community but the programming community in general could really learn from Haskell.

So the style is called Property Based Testing, some people call it Generator Based Testing but I think Property Based is a better name because the goal is, instead of specifying one or two or three examples of what you want your code to do, instead you make general statements of based on this kind of input, if I can say this is about the input then I should be able to say this about the output. It’s a lot harder, it’s a lot harder to think about the code in general, and then you create, or use the provided ones, generators for the input data, and what ScalaCheck will do is generate a hundred random sets of input, run that for your code and then check each of the properties you specify. So in my case when I’m making a ranker that listens to a Tweet and then produces some opinion about it. I’m going to generate some Tweets and ScalaCheck had some really cool ways of creating generators in a fairly simple fashion and then I’m going to generate some of those, pass them through and then make sure the opinion maybe I’ll check that I gave it positive points, maybe I’ll check that the response I suggest is a valid Twitter response. I’m not usually specifying exactly what I want the output to be, sometimes you can, but often that will be duplicating code instead I want to make statements about what is important about this output that just came out. I know it should never be less than zero or greater than 100, you can kind of narrow in on what are you sure about your functions. This is harder and it makes you think about the general case.

I don’t think that we should replace all of our example tests with property tests because the example tests are 1) way easier to writes, a good place to start, and 2) more human readable. But I prefer to have like one example test and then one detail property based test that where I make sure that I didn’t miss any crazy edge cases. Now one benefit I get from this is that I have fewer test cases, I have 1 or 2 test cases per function and those cases are extensive and it’s going to run a hundred times so it’s not like I’m only running one test, and the positive of that is that changing the functionality of my program is easier. When I change the structure of something I don’t have to go back and change 8 tests, I don’t have to spend a lot of time removing duplication from my tests and adding indirection to my test which makes them harder to read. My test can be extremely clear because they are few and I find that gives me more flexibility to refactor and change my code and to change little pieces of it that aren’t core to the business functionality without breaking tests.


16. So you mentioned that ScalaCheck or a QuickCheck generates, it uses these sort of constraints, I guess, and generates random test cases. So how does that impact testing, I mean are they truly random? So if you do five test runs you find a bug once but not the other times?

Pretty much, I mean the ScalaCheck test generators will put emphasis on edge cases so you are very likely to get zero/max/min ones and then some stuff in the middle, and then when you write your own generators you could narrow that range, you can say stuff like: “I need an integer between 1 to 10, I need a list of Strings and I want between 0 and 20 Strings but I want the Strings to be between 1 and 50 characters long”. You can do all of this with the built-in generators. So you put your own restrictions but then yes, it’s totally random and it is not uncommon for me to – OK, at work it’s been uncommon but in my own playing around it has been really common for me to, say, the fifth time I run a build I manage to break a test and sometimes it's just that I wasn’t specific enough with my property and I specify that they should be in this order, but there was a tie and so they expected the input, and that one I rewrote that test to say instead of “I expect this order” it was “I expect every opinion to have a higher score than the next one in the list and check these two and check these two and check these two”, and I really like that because is better describes the meaning of what I was trying to accomplish in that particular function under test.

Werner: So you can make the properties, I think that's the term, you make the properties more specific or you narrow it down.

Or sometimes it's that my property is too specific, like in that example just there my property was too specific, it said: “They should be in this order” when really all I care about was that each one is greater than or equal to the next one, it's not about the order it’s about the relative score between the two opinions I was comparing. And the beauty of that is once you get used to this style, the style of testing, you can read the properties and get more information than you get from an example based test. Example test you have to read a bunch of them and then try to generalize in your head what is really going on if you are using the tests as documentation. The property based tests when they are carefully written have the opportunity to make general statements about the code.


17. They are sort of procedural tests so to speak, generate the test space procedurally so you don’t have to write it manually. If I wanted to have, if I have like a domain and know I can generate all of the test cases exhaustively, would I write a generator or how does that work?

I probably would create a generator and I don’t remember, I think it might be something in ScalaCheck that lets you say these are all of the test cases but don’t quote me on that, I would totally write a custom generator for that.

Werner: In some cases I guess you do know that I have thousand things that I need to have and don’t want to rely on the random generator.

Right, now and then your input set is small enough so you can do that.

Werner: So Jessica you’ve given us a lot to look into, all of our audience will look at scalaz, but thank you very much Jessica!

Thank you!

Jan 24, 2014