BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Jessica Kerr on Java vs. Scala, Property Based Testing, and Diversity in IT

Jessica Kerr on Java vs. Scala, Property Based Testing, and Diversity in IT

Bookmarks
   

1. My name is Charles Humble and I am here at QCon New York 2014 with Jessica Kerr. Jessica, can you introduce yourself to the InfoQ community?

Sure. I am a developer, until very recently I was doing Scala for a year and a half, now I have switched over to Clojure, I am now working at Outpace Systems, which is a startup. It’s been around for about a year out of San Francisco, but everybody is remote, so I get to remote pair and learn Clojure and learn Emacs and it’s a lot of fun.

   

2. You started as a Java developer originally?

Right, and I did Java for a dozen years before I got the opportunity to move to Scala.

   

3. How did you find moving from Java to Scala?

You know, it was challenging to move into a codebase that already was being developed by experienced Scala developers. They had already had a couple of years to swing their transition from the Java way of writing code into something more idiomatic in Scala. So moving into that was a little bit challenging and at the same time I moved into a whole new business domain and bioinformatics is a challenge in itself. So learning those two things at the same time was challenging, now I am doing that again with Clojure and Emacs because learning a new editor is just as challenging. But that makes it exciting and what matters is being on a team that cares about learning and cares about bringing each other up and we very much have that at Outpace, we have a lot of people who will patiently disagree and help each other learn.

   

4. Are there specific advantages to writing in Scala versus writing in Java?

Absolutely. The easiest obvious payoff is the size of the code goes down a lot, the fewer lines of code you write, the fewer you have to maintain. For instance, the project I was on at Monsanto, when they went from Java to Scala they reduced the codebase by a factor of 7, it’s a lot, a lot less boiler plate. So that’s the easy way going from Java to Scala and in particular I recommend this for your test suite; tests often have a lot of boiler plate code that gets really tedious to type and also people are more comfortable experimenting with test code sometimes than production code. So the tests are a great place to move from Java to Scala to take advantage of the boiler plate reduction and then once you start learning Scala you get used to passing functions around, you get used to maybe coding within a context and writing code that accepts other code to run in its context; these are some concepts that I learnt in Scala gradually and I use more and more. So, one of the big advantages of Scala as a hybrid OO functional language is you can take your style of writing in Java and you can totally do that in Scala and it will work. That’s also the biggest disadvantage of Scala in my opinion, as someone who prefers to learn a language in order to twist my brain into a new way of thinking about a problem; Scala won’t force that on you so you can learn at your own pace.

Charles: That’s really interesting, one of the plus points of Java I guess is that everyone writes Java code the same way, and that isn’t really true of Scala.

There are very strong idioms in Java and in some other languages like Ruby, Ruby has really strong idioms that the community agrees on, Scala I don’t think they have that yet, sometimes I say only half kidding that Scala is the new Perl only because Perl gets a bad rep for having 15 ways of doing anything, so if you are reading Perl, unless people are following very specific coding standards, you have to know all 15 of those ways, it’s more like there are 3 or 4 ways of doing everything, it’s not quite that bad. And when you are writing Scala, that’s beautiful, it makes it easy; you can learn it one at a time. When you're reading Scala you will encounter those at some point because the community hasn’t settled on exactly one way of doing things. Also Scala has more and more fantastic libraries out there, that's another reason to move from Java to Scala because the really interesting innovative frameworks are mostly in Scala now, things like Play and Akka and what Twitter is doing, so you can technically use them from Java, but it’s a lot easier to call Java from Scala than Scala from Java. So, while you are exploring a language you can also explore a lot of the innovative new ideas and frameworks.

Charles: Cool. So, one of the other challenges of Scala, at least something that I found challenging when I was looking at it, was certain aspects of the type systems, so things to do with I guess the interaction between OO and polymorphisms and some of the stuff around covariance.

That’s true. Scala and its type system is trying to be super flexible and accommodate everything you need in OO and everything you need in functional and it accomplishes it, but at a cost of complexity and when the type system is Turing complete that’s a little ridiculous and you can do some really awesome things like I’ve even worked with Shapeless a little, which is one of those crazy libraries that abuses the type system so that you can have a List of String and then an Int and then a String and then an Int and so on, you can type individual pieces in a list, that’s just one small thing that we used it for. But then the compilation time goes up and up as the compiler does more and more work, it kind of gets ridiculous.

   

5. And there were, I’m not going to say I followed up on these, but there were reports I remember Yammer talking about performance problems within idiomatic code I think in the collection library and possibly in Clojure’s as well, is that gone away, do you know?

Was that back in 2011?

Charles: That was, yes.

Scala has matured a lot, that’s one thing that’s beautiful about Scala compared to Java is that Java is totally handicapped by backwards compatibility. Look at Java and C#, Java used to be better, C# took all the good ideas and then C# is way ahead of Java now and one of the reasons is that Microsoft doesn’t have to support everything from 1.0 in the current version of the CLR and Java is really handicapped by that. Odersky with Scala has chosen not to keep those handcuffs. At this point there is enough existing code out in the language that breaking changes to the language are far fewer, and most of the libraries will support at least 2.10 a couple of versions back, so there is not a lot of pressure to upgrade and the upgrades that Odersky does consider, he makes sure that if you do have to make code changes it’s something mechanical that a program could do for you. So the backwards compatibility frees Scala to make a lot of changes and a lot of improvements and I think they have, I think the language has grown nicely and has also stabilized, those idioms still aren’t there, I’m sure there will be some books published before too long, but right now it’s fun to explore and check things out.

   

6. Excellent. You mentioned libraries a little bit with Akka and Play and those kinds of things, and I know ScalaZ is not one that you are quite fond of, right?

ScalaZ is fun, it’s a little bit like Scala for people who wish they were writing Haskell, but also ScalaZ takes advantage of the OO aspects of Scala and in some ways it combines the best of Haskell with the best of OO because you can say things like, and I am sure this is something you just desperately want to say every day, a monoid is a semi-group, which totally means something to mathematicians, but the goal of – people call them design patterns, and they are totally not, they are totally different – functional categories like monad and applicative functor and all these other things is that you can make absolute statements about them instead of just saying a functor implements the map method, which it does, there are explicit properties that you can say that these things follow and mathematically they have to fit together and work this way which is very different from OO where we are used to depending on the Liskov Substitution Principle and these objects will fit together unless you violated that in your implementation and extended this and did something different.

The mathematical types like monoid and semi-groups that are represented in ScalaZ have guarantees that are stronger than implementing an interface, and when you combine that with the isA relationship so that the monoid can inherit not just the methods but the guarantees of the semi-group, that’s expressive and there is some fun in that. And I don’t pretend to understand category theory, in fact I am very carefully not learning category theory because I want to be able to teach functional programming to everyone, not just mathematicians, but what I have picked up as I have gone along I can pick up little pieces of ScalaZ and start using them. Monoids are a great place to start because they are super useful, it’s easy to understand their purpose, they put things together, we do a lot of that in programming, in data processing, put things together. So if you are interested in that stuff, start with monoids.

   

7. Great. So you described ScalaZ as being like a library for Scala programmers who wished they were writing Haskell and I know its testing framework is ScalaCheck which is basically just a port of QuickCheck from Haskell. Is that right?

ScalaCheck is actually the testing framework of the compiler itself, it is one of the only libraries that the language itself depends on. The ScalaCheck is very widely used in Scala, it started out as a port of the Haskell QuickCheck library so it does property based testing, which you are right, the ScalaZ people love property based testing, for good reason. It’s because when you want to make a guarantee that your class will always meet this condition, property based testing is perfect for that, it really expresses always, as opposed to the unit tests that we are used to which are example based tests that say “in this case this output happens and in this specific case this output happens” and those unit tests, the example based tests, they let people interpolate between them. The property base tests they say “no, no, no, we are going to check this case and this case and 18 cases in between and all these cases over there in the corner that you would have forgotten otherwise and we are going to make sure that the statements you make in your tests are true everywhere”.

Charles: Is it quite hard to write? It sounds like it might be.

Yes. So, the negative and the positive of property based testing is that it’s way harder to write these properties. It’s much harder to think what will my code do, what must my code do in general than what must it do in a specific case. We’re people, we work with stories, we think with stories, and example test is a story, this input comes in, I pass it to my code, this other thing comes out, that’s understandable, that makes sense to us and we can think through that a lot more easily. And I want to distinguish the particular type of thinking through that because when you are thinking about a story of what comes in, what steps the code does, what comes out, then you're making a prediction about what the code will do and that is distinct from reasoning about the code.

So when functional programmers talk about reasoning about code — that phrase drove me crazy for a year and a half until I finally sort of got it — they are talking about making statements that apply generally, what the code will do in any case not just what it will do in this particular case and it’s not about telling the story of the code, it’s not about predicting what will happen when, it’s only about making a statement about how the output relates to the input and that’s also a property. So properties are reasoning about code and they enable higher level reasoning about code once we have them and we can build on them. But that’s much harder to do, it’s much harder to think what will happen in every case and the other obvious question is how do I say “ok, for this input I expect this output where the input is just variables without duplicating my code”? And that’s totally a challenge, so I have come up with, no, I should not say I've come up with, I have accumulated a couple of strategies for writing properties, you can aim for a complete specification of stating exactly what the output will be, they don’t have to, just being able to box in the output and say it will never be greater than this, this string will never be empty, this will never be null, it will never throw an exception, the output will always be less than the input, you can put boxes around your code that you can make a general assertion about and then supplement with a few example tests.

I never think you should get rid of example tests, they are easier for people to understand and that’s reason enough to keep them, but I do want to have fewer example tests and very slightly more property tests. So the negative is they're harder to write, that’s also a positive because it makes you think harder about your code and when I sit down to do test first development and I say alright, I need to write this little class or maybe it’s going to be a group of classes, this code that's going to implement a particular feature, what properties can I test about it, and I sit there and I think and I think and this is hammer time, this is not your “I’ll start with the simplest possible case I’ll just start typing and then get immediate feedback, red-green-red-green”, that’s really satisfying in TDD, but it also sometimes feels cheap, I get this tiny little victory compared to property based testing forces me to sit back and think about the problem as a whole before I start implementing any solution.

I’ve also found that writing properties, the goal there, the goal of any testing, really, but especially this, is to define success, what must my output do, what boxes must it fit in, what conditions must it meet to call that success, and it’s not just it works for this one example that they put in the requirements document, it’s something broader than that, what does success mean. I often find that in order to know whether or not I have succeeded, I have to either return more information than I thought I needed, or if I am testing an after subsystem, sometimes I need certain visibility into the state so that I can make statements about the consistency of the state of that system at any given moment and I wind up adding that visibility into the code or baking it in as I am writing it and that helps in production. So thinking about the problem as a whole is a wonderful thing, it drives me to better solutions and also very carefully defining success, is hard and productive and it helps me pass the testing phase by giving me visibility into what is important about the functioning of the system.

   

8. Does ScalaCheck have ways of creating the generators?

Yes. Actually, one of the beautiful things about Scala is its used in ScalaCheck and particularly the generators. So the purpose of the generator is to come up with some valid input data and if you call it enough times it should eventually come up with all valid input data, one of the beauties of those is that it makes it really specific what is valid input data because you are coding that into your generator and that’s something people can ready for documentation. ScalaCheck makes this easy, it uses implicits, and implicits in Scala are like a magic hat. So, the Scala compiler has a magic hat that you can put implicit values in and then later take them out as implicit parameters and based on the types, usually it’s completely based on the types, the Scala compiler will find those for you.

So ScalaCheck puts some generators in the hat for all the built-in types, in String, Double, and the built-in combinations of types like sequences and Maps and Tuples, they all go into the hat and the other beauty is that when you add your own generators into the hat, like if you have, I am going to make this up, a Carrot class and you put a generator of a Carrot into the hat, then the Scala compiler essentially mixes those up and when your test asks for a generator of a Sequence of Carrots, it wants a whole garden, the compiler can build that, because the Sequence generator is “I can make a sequence for anything that I have a generator for” and the compiler says “well, I have a generator for a Carrot”, so it combines the sequence generator from ScalaCheck with your Carrot generator and suddenly you can have Sequences of Carrots, you can have Maps of Strings of Carrots, you can have Tuples of Carrots and Ints, all those combinations come for free. I really like that about Scala’s type systems and I think ScalaCheck makes great use of it there.

   

9. Do you get different results for each test run that you do effectively?

Usually, yes. Unless your input type is Boolean and you are checking every value every time, you are going to get different results. The built-in generators in ScalaCheck are completely random and each sample, each test is independent, so there is technically no guarantee that if you request a random integer that you will get zero every time you’ve run say 100 checks, 100 is the default number of checks to run, you can change that, of course. But say you do 100 there is no guarantee you get zero, but really you’ll get zero about 10% of the time, you’ll probably get zero, you’ll also get 1, -1, you’ll get max int, min int, and you’ll get some stuff in between. So, yes, every time you run a new test it’s different, and this is kind of fun with continuous integration because now and then tests that have been running for ever will fail on you and people go “oh my god, that’s terrible, why would I want that?” Well, you want that because it just found a bug and 90% of the time the bug is actually in the test but, still, then my test is better and then the rest of that 10% of the time when it finds a really obscure, oh-my-gosh-I-can’t-believe-this-combination-even-happened bug in the production software is one time you are actually able to diagnose it. And with the failure you get all the information of all the input that was generated, so typically what happens when I get a failure in continuous integration is I will make that into an individual test case, an example test case to make sure that bug doesn’t reappear, and then I can duplicate the failure locally with that specific input and I can fix it.

   

10. Great. So I am going to change track a bit, there is quite a lot of concern in the IT industry, generally in terms of diversity, particularly in terms of women in IT, and I don’t mean women in IT generally because obviously there are quite a few, but specifically in programming jobs and architectural roles and that kind of thing. Do you think that’s a valid concern?

Yes. I love it that people are concerned about it and I am glad. I didn’t used to be, I enjoyed being the only woman in the room, it’s my natural state. Why? Because I am a programmer; but I am an exception in the case in that I'm one of the few women who’s comfortable with that, I think a lot of people look at the exceptions, look at me, I am out there speaking at conferences, therefore any woman can do it. No. A certain very small percentage of women can do this. Look at the percentages, a much higher percentage of men are on the speaker lists at conferences and why the disparity, and I think there are a lot of reasons, but we can’t look at individual examples as proving that everyone else is not here by choice.

There is a ton of influences and not just women, I think the racial disparity is worse, I don’t know about numbers, but at least people are comfortable talking about gender disparity and arguing about it and that’s kind of fun. But percentage-wise there is a small percentage of women who don’t mind being the only woman in the room and right now those are the ones who can stay in programming, but the larger percentage of women who are more comfortable interacting with other women for various reasons and will be more likely to stay in the field if the gender balance is better, it’s kind of a feedback loop, the more women in the field, the more women will be comfortable in the field.

There are some colleges that in particular years have noticed entry level computer science just happened to have close to 50% women that year, enough of them stuck through the first year that they were with each other and then they didn’t drop out as much. It’s an interesting example of more women make more women comfortable and so the percentages can spiral either direction; if we get more women, more women will be more comfortable eventually we’ll make it to even, but until we get to that point if we want to reverse the spiral from how it currently is, which is more and more women drop out, then it takes concerted effort. And I see that concerted effort happening in our industry, I see it at conferences, I see conference organizers really looking for women speakers and I think that’s fantastic, because when I went to my first few conferences there were “no fluff just stuff” and they were pretty consistent with no women speakers. So, it never occurred to me to speak until someone said to me specifically, “hey, you ever thought about doing this, you’d be a good speaker”; it took that sort of personal suggestion for it to even occur to me that I could do this, part of the reason was that it was all men out there.

   

11. Do you think women only events are a good idea?

Women only events? Women only events are a fantastic idea, yes, I do. And it’s because there's that large percentage of women that feel more comfortable around other women. Some of that is simply being with people closer, there's very different cultures in the US for men and women, there are different cultural expectations, there are different things you are supposed to like, ways that you are supposed to talk and we are generally more comfortable with people who are more like us. I have a very interrupting argumentative communication style, I am just lucky that way as far as programming goes, sometimes I work on changing that to make better conversations, but as far as getting ahead in programming, that is so to my advantage and it’s so to my disadvantage when I try to go out with the other moms, they don’t want to argue about anything and I am like why not? So, it’s all percentages, I really think, but the women only events give the women who are more comfortable — especially for learning, learning is high pressure and we are often more comfortable learning in a safe environment, most women feel safe around more women and that gives them that sort of environment.

Some people are like “well, that’s not fair, we don’t have men only programming events”, “well, you kind of do, it’s every meetup everywhere”. Not that they are men only because you don’t have to restrict it to men only to get 95% men. I get excited at the user groups when I am not the only woman. So as a white man in computer programming you probably will be able to walk into any general meetup and feel comfortable and feel like you belong. And as a woman or as a person of color, you do not have that confidence. People who are not ridiculously extroverted, women or people of color, they have to seal their nerves and it takes an effort to go to that event because you don’t know if anyone is going to make them feel welcome despite looking around they will not necessarily feel like they belong. So we can create events that are explicitly look, you can come here and you know you are going to feel comfortable, you know you are not going to feel stupid asking a question and nobody is going to look at you like the only woman here is stupid, therefore all women are stupid. That’s called stereotype threat. And when you can go to a meetup with the confidence that that isn’t going to happen, then a lot more people show up, and a lot more people are relaxed mentally, they're not on edge, they are not on do I belong here, do I belong here. There is a great website for that, stereotype threat something, you can Google that and find it, and when you are not on edge like that then you can relax and you can learn and you can enjoy programming and you can have a good time and feel like this is something you can do.

Charles: Got it. I think that’s a really good place to end it. Jessica, thank you very much for your time.

Thanks.

Nov 01, 2014

BT