BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Paulmichael Blasucci on Practical Property-Based Testing with FsCheck and F#

Paulmichael Blasucci on Practical Property-Based Testing with FsCheck and F#

Bookmarks
   

1. [...] Paul Michael, welcome to QCon. How did you first get into software development?

Barry's full question: I am Barry Burd, professor at Drew University in Madison, New Jersey and I am here speaking with Paul Michael Blasucci at QCon New York. Paul Michael has spent the last 17 years blending a disparate array of languages, technologies and methodologies to develop compelling solutions to a wide range of business problems. He especially enjoys solving challenges in distributed computing, visual communications in heterogeneous enterprise systems. He is a co-founder of Nash F# – the Nashville F# Meet-up and a co-organizer of the New York City F# users group. In 2014 he received a Microsoft MVP award for his work in the .NET community. Paul Michael, welcome to QCon. How did you first get into software development?

I actually backed into it. My education is actually in visual communications, a fancy way of saying graphic design and when I first got into the working world I was working designing interfaces for CD-ROMs and the odd webpage when the web was very nascent. I became increasingly frustrated that I would spend hours making these beautiful comps in PhotoShop and I would design everything all carefully and then I would give it to the engineers and it would look nothing like what I had designed.

I started to try and figure out: “OK. How does the technology work so that I can design for them and what I cannot”. I can work within constraints, that is part of the business. As I started learning more about how things worked, I found I actually liked it and I enjoyed actually coding, more than I enjoyed graphic design. So, I just gradually transitioned. I started from the front up the application stack. I started at the front with designing user interfaces, to building them, to then “How do we code user interfaces in other technologies”, to building desktop apps, to then what is the back-end, in the servers and databases, down to device programming and stuff. So, I organically felt my way through as I stumbled through finding things that I liked, but the start of it was trying to figure out how to make the web pages look the way they did in my notebook.

Barry: So you did not do any formal education in software development.

No, not at all. In fact, I have a certificate in visual design and I was actually studying journalism. So, computer science and programming is all very much – I wouldn’t say I am a dilettante – but it is sort of a labor of love for me.

   

2. My readers always ask me how they can bootstrap themselves in terms of learning software development. What advice do you have for them?

That is a great question. I think it is not unlike writing: code, code, code. The best thing you can do is to actually be producing stuff, even if a lot of it is going to be crap at first. But it is like writing or any other creative endeavor, right? You have to keep doing it. There is so much information on the internet so the best advice now is to read everything. I would not say go to conferences, because that could be expensive and somewhat intimidating for beginners, but go to meet-ups which are super-important, super-great. Even now I am doing this for as long as I have, I learn stuff all the time by going to meet-ups. The advice I would give is not to go to meet-ups where they have somebody stand up and speak. Those are very useful, those are very informative and inspirational, but many of meet-ups would also have hands on component to them. Those are the ones you really want to go to because that is where you will really get a chance to learn because you get to do and experiment and bounce ideas off of other people. That would be my advice. Meet-ups were not really as big a thing when I started programming and I kind of wish they were because I probably would have gotten into the whole space a lot sooner.

   

3. What sort of problems are you solving at your day job? What tech are you leveraging?

Right now, I work for Quicken Loans, which is one of the larger mortgage providers in the US. We do retail mortgages. The group I work for within the organization does the trading that actually finances the mortgages. Money just does not come out of thin air, but they are backed by traded products in the stock markets and that creates the funding. So, right now a lot of the problems we are solving are codifying a very complex and sort of organic domain model that has evolved over the past several years. The business has ways of doing things, but they are not formal. They are formal to the business, they are methodological to the business, but they are not documented anywhere. The only way to know about it – they are like folk knowledge almost. The only way to really know about it is to spend time working on the trading desk. So, we are currently building tooling to support that and a big part of what I am focused on is this domain model – using domain driven design in F# to actually, very accurately and robustly capture these sort of complex business models.

   

4. In what way does F# help you in your work?

In a number of ways, the biggest being that it provides many different tools to get ideas flashed out quickly and it has a very unencumbered syntax. The signal to noise ratio is very low and for domain modeling, I found that is really useful because there are times where I will be in a meeting with someone and I would have my notebook or notepad or text editor open and I would actually be typing out F# that is capturing the domain as they are describing it to me to the point in which it is so unencumbered that the business feels comfortable reading it even though they are not technical.

I have had businesses that said “Hey, can I put those notes in our meeting recap” and I said “Sure, but it is also going to be in the code base in two weeks, if you want to look at it there.” So, that is a big element. It is almost like the code as documentation and the language – I think all MLs, but especially F# – there is this nice balance of features that really lends itself to that syntactically. The other thing is that its functional programming foundations are very useful in just avoiding certain common – what is the word I am looking for?

There is a lot of incidental complexity with software, I think, and it has some features that help reduce a lot of that: in the functional paradigm, by default making everything immutable, by default having this notion of structural equality where things are defined in terms of what they are composed of as opposed to being modeled as pointers to some opaque thing in memory. Those things are really useful.

That and the community around F# is really helpful. We are not talking about technical “How to do this?” – they are helpful with that too - but helpful in the sense “I am struggling with a very abstract, subjective problem, there is no one definitive right answer, there is a couple of different ways I can do this. What do you guys think about this or that?” Much the same as if you are writing something, you might go to a writing professor or your fellow writers and throw around different phrasings and things like that to get a sense of “What if you were to do it this way?” Same idea, but with code. So, very, very interesting and helpful, I think.

Barry: Let’s put this in prospective. Let’s compare F# with other functional languages that you have worked with.

Ok. That is good. I find the tradeoff with F# versus say Haskell – from a language standpoint, Haskell as a language is perhaps even more expressive than F#. But I find that I would never really use Haskell in production code for a number of reasons. One: it tends not to play well with others, it tends to only play well with itself. And two: the Haskell community is very interested in theoretical things, they are very interested in this notion of interesting problems, which makes sense because historically Haskell is a research language. It was meant for programming language research to prototype new ideas. So this makes sense, but what it means in practice is the community does not have a lot polish or craft around the mundane parts of software engineering. They are sort of the library stability and the documentation and package management, the things you look for in day-to-day software to get things up and out the door quickly.

   

5. Do you have any experience with OCaml or with Java’s Lambdas?

Not with Java’s Lambdas. I have quite a bit of experience with applying functional ideas in C# and JavaScript. The only OCaml I know really is F# being derived from OCaml. I have learned to read a bit of OCaml as a way of bootstrapping what I know about F#. The other functional language I have done a lot with is Erlang, which is interesting in that it very much charts its own territory. Most functional languages will either fall heavily into the ML discipline – things like Ocaml or Scala or F#, or Haskell or they're from the Lisps discipline: Lisp, Clojure, Scheme, so on and so forth. But Erlang very much does its own thing? It is inspired by Prolog, very much in its own space and very, very interesting.

   

6. Can you talk about your work with the New York City F# user group?

Yes. I was born and raised in New Jersey and have been working and playing in New York since I was 14. A few years ago, a fellow by the name Rick Minerich actually moved to New York from Western Massachusetts. He was very, very interested in F# and wanted to start a meet-up. Him and I became fast friends because I was also interested at the time and so he started this meet-up and I started helping him with it and I started running this meet-up. Then, last year, I moved to Nashville and I started my own meet-up, but then I moved back and so now what we are doing is that we have a lot more F# talent in the New York area than we had a few years ago. So, we are actually taking a three-pronged approach: Rick makes a point of doing this thing where every month he gets a new speaker to come in and present on something interesting in a sort of lecture-style format in Manhattan. They have this great event space in the Empire State building that he uses for that.

Then a woman who is in the F# community by the name of Rachel Reese does a thing out in Hoboken where they do like a structured, instructor-led tutorials a couple of times – I guess once a week or so. Then on the third Saturday of every month, right here in Brooklyn over in Dumbo, I actually hold what we call a lab session. So, for five hours on a Saturday you can go over to the NYFA space over on Jay Street. It is like a drop-in/drop-out hack on some F# projects, we do a dojo which is sort of your hands-on coding exercise that is loosely structured. There is not a ton of formal structure to it. It is just a great place to work on talks or to come research things. We had many beginners who wanted to spend some one-on-one time with some of the folks who had been doing this a bit longer. We also have some light snacks throughout the day.

   

7. Do you have any secret weapons in your F# toolkit that you care to share?

In the F# toolkit?

Barry: Or in any toolkit.

In any tool kit. The things I would say are definitely – generally speaking, I am lately really, really a fan of what I can get out of FSCheck for random data generation and random testing which is what I was talking about here at QCon. But also, there is a great library called ZeroMQ which started out as a library, now it is more of a community. That is a really lightweight way of doing distributed computing.

Things that in the .NET space previously you would have done with WCF, things that at an operating system level you might do with just raw sockets or a memory mapped file, ZeroMQ is just a really nice, light-weight library that layers some very useful messaging abstractions and queuing abstractions on top of it.

So it sort of takes the best lessons learned from years of doing distributed computing in Erlang and in other systems and gives you this nice convenient wrapper or a library for building distributed systems out of these primitives. I found it to be tremendously useful in all sorts of things: from building job scheduling systems to adding, embedding chat inside a desktop application - whatever you could think of that involves two processes talking to each other.

Barry: Tell me a little bit more about FSCheck.

OK. I gave a talk on FSCheck this morning. It is a library for doing random testing which I guess more recently has come to be known as property based testing. Unlike traditional unit testing where you take specific value – like an exemplar value – and you try to exercise your code with it – what you do is you define a bunch of boolean expressions that assert properties that should hold for your code at all times. Then there is a bit of a random data generation and what happens is this: Let’s say I am testing a function.

I will write some other function that makes some boolean assertions or properties about the function I am testing and the FS Check system will call that test function many, many, many times, with new and differently randomly generated input data each time and if all the boolean expressions were turned true, everything is good. When a boolean expression returns false, it tries to create some different data and get you a more specific data automatically and keeps going to get these smallest and simplest possible pieces of data that will cause the property to be falsifiable or to return false.

So, it is a very different way of looking at how you test code and it is less about this sort of check box style of “This function is covered, this function is covered, this function is covered” and more about like getting a deeper sense, a deeper intuition about what your code is doing, a deeper understanding about what your domain should be doing, because in addition to these properties and the random data generation, it will also let you customize the random data generation.

So, it does a pretty good job with primitive values of creating useful distributions of data, but it lets you customize it such that you get very specific and very finely tuned distributions of data. There is some interesting research that is somewhat anecdotal – I guess not anecdotal, but somewhat informal research that suggests that when your distribution of test data appropriately mirrors your real world data distribution, you can get very, very accurate results testing this way. That is not always possible for every domain.

   

8. I see a claim that you are making here that property based testing is useful in the real world. Do you want to elaborate on that a little bit?

Property-based testing has its origins in a library in Haskell called QuickCheck, FSCheck being a nod to that. It started as a paper that was presented at the ICFP back in 2000 and there is very much this sense of – you know, people hear these things and think about it as being “I write boiler plate line of business apps. My code does not have properties. I do not have these mathematical constructs in my code. I am not building these fancy models.” The truth is that those properties are there anyway. You are just not seeing them, like for instance idempotency: it is a property that crops up in a lot of code, in a lot of places and it does not matter whether you are doing something very, very mathematically oriented, like something in the finance industry or if you are just building a simple data collection utility.

Another one would be this notion of duals or inverses, this idea that some action has an action that is the complete opposite of it, like serialization and deserialization or round tripping persistence through a database like if I have some data and I write it to the database and I read it back out of the database, I should get the same object back. Well, that is dual, that is an inverse, that is a general property that should hold no matter what data I throw at it so I can create lots of randomized tests for that. That is sort of what I mean. These properties are everywhere in our code and it is not only certain domains that can benefit from it.

Barry: One of the pitfalls that I understand in the case of random generation of tests is the possibility of generating, emphasizing values that would not normally come up in the real life use of a piece of software.

Right. Two things happen here. On the one hand, this will very often bring to life edge cases that a developer would not normally think of, but you do sometimes spend cycles on things that are inappropriate or that are just not relevant and that is why libraries like FSCheck have a very sophisticated API for fine-tuning the data that is generated. It is a little bit of developer due diligence to actually study the data being generated and to understand the data that suits your domain and to fine tune and get the two to coincide. But I think that helps you develop a deeper understanding of your domain which can only improve your development.

   

9. In your experience, is there a possible pitfall with random testing of missing some of the cases that users would typically run into?

If you naively just take the approach of taking the out-of-the-box random testing and you do not do any customization, probably. I think if you spend a little time studying the typically use cases you usually have for your domain and tune the data appropriately, that seems less likely. I would think the bigger concern would be a sort of the equivalent of overfitting. Let’s say I am building something that is a complex calculation for doing the price of a payment on a mortgage.

Let’s say I go to the users and I ask them very specifically “What do you think the possible ranges are for this value?” and “What do you think the possible ranges are for this value?” and “What do you think the possible ranges are for this value?” If they all collectively have this experience of only ever having seen coupon ranges from 2.0 to 6.0 and they tell me that and I only code it to account for that, then the minute there is a 6.5 coupon suddenly things are - there is sort of underfitting / overfitting that you have to struggle to balance and that regarded sort of like fine tuning data models in other contexts.

Developers have to put a little effort in somewhere and I think that is somewhat more useful use of developers’ time than ensuring code coverage metrics, ensuring that “I have made sure that I have tests that visit every conditional branch in all of my code” That is useful, I guess, but I feel like understanding the domain via its data is perhaps a bit more useful. I also tend to take a kind of data centric approach to things, so that could just be me.

   

10. Would you like to talk a little bit more about ZeroMQ?

Sure. So, I guess the interesting thing for me about Zero MQ is that it started out as a library in C, but it has a very strong open-source slant to it and at this point, it has evolved into a series of protocols and has bindings in over 50 different languages. So, it becomes one of the best ways I know to build heterogeneous systems, to build polyglot systems, to build systems where I have a Java component over here and a C# component over here and they are talking to each other in a way in which they do not know that it is a Java component talking to a C# component.

They just know that we have this generic protocol and as long as we speak this protocol we can talk to anything that speaks this protocol. That is something that the community is very big on: making sure that the protocol itself remains useful, simple, well documented and above all, I do not want to say generic, because that is not the right word, but general. They do not want any one technology, bias or assumption to influence things. The idea is that it is message passing. So, under the covers, very often, it is like a way of taking the same sort of message passing that Erlang does, this actor-based concurrency model. It is similar to taking that, but then writing it large at the application level so that instead of being forced to talk directly over sockets, you have what looks like sockets but they are actually doing smart message passing with intelligent message exchange patterns and buffering as appropriate and things like that.

Barry: So you are a fan of polyglot programming.

Very much so.

Barry: Tell me about that.

Lately, I have been thinking that I like polyglot programming and I started thinking what I like so much about it. I feel like there is a very big assumption for people to want to just do something one way and that is it and I feel like you miss out. If you study linguistics, sociology, and psychology, there is very much this notion of language evolution and of different idioms. There are idioms you have in Spanish which you do not have in English, there are idioms and expressions that you have in German that you do not have in French.

So, there is this cultural influence and this dynamic relationship between thought and language and the thing why I like to think with polyglot programming is this idea that if you look at a particular domain and you are open minded to the technologies you can use in that domain, you can find languages that are more or less appropriate to that particular problem. First of all, I think no one really builds single language applications any more. I mean, maybe, if you are writing a device driver and it is 100% in C.

But, modern applications, whether they are web or desktop, there is at least some SQL, there is probably some XML, if it is a web app there is some JavaScript and some HTML and whatever is on your server side. So I think no one really does one language development any more, for the most part, but typically, what we have seen historically is that the environment dictates the language: we use SQL because that is the language of relational databases. We use JavaScript because that is the language of web browsers.

But what I hope to see us move towards with polyglot programming is – it is great to have ways to connect them like we currently do, like ZeroMQ or like open standards like HTTP or what not, but it would be nice to get to the point in which, taking environment like the CLR or the JVM where you have multiple languages that all come down to a common bytecode to get executed in the same environment. You could theoretically start to be much more discreet about, say, for this particular piece of the domain because it is a problem involving say natural language parsing or machine learning, I am going to do it in F# because the functional nature and the ML heritage of the language lends itself to that problem domain very well, but then this bit over here that requires some very complex 3D rendering, I am going to do this bit in C# because there is going to be a bunch of bit twiddling involved, because C# is more natural and is up to that.

This is sort of a contrived example, but those sorts of things where you are not just letting the technological boundary drive your language choice, but you are actually looking at the language’s fitness of purpose for a particular problem. It is a subtle thing, it is a very subjective thing and it is culturally informed. So, it is something that I find interesting, but I do not know if I am looking at it a little too much like a liberal art student and not enough like a scientist.

   

11. There is nothing wrong with that. You may have already answered this or covered it, but is there one cool new thing that you want to leverage more in the immediate future?

Yes. Actually, I have been spending a little time, talking about other languages, looking at Rust as a programming language and I am really fascinated by it. I have spent a ton of time in managed languages and scripting languages and I have done a fair bit of C as well, but it is fascinating for me to see that they have this language Rust that has these features in it that we normally associate with these more sophisticated languages that require a VM and an explicit whatever. Rust has these features in a no–VM, 100% native, almost C-level speed in terms of compiled executable – not quite there yet, but almost there, but without a lot of the overhead. It does all the stuff at compile time with a very, very intelligent static analysis process in the complier. So, I find that fascinating.

Barry: Very good. Paul Michael, thank you so much for coming to QCon.

Thank you very much, Barry.

Aug 09, 2015

BT