BT

Brian Marick on Test Maintenance
Recorded at:

Interview with Brian Marick by Ryan Slobojan on Dec 03, 2010 |
34:21

Bio Brian Marick was a programmer, tester, and team lead in the 80's, a testing consultant in the 90's, and is an Agile consultant this decade. He was one of the authors of the Manifesto for Agile Software Development and Chair of the Board of the Agile Alliance. He's the author of The Craft of Software Testing (1994) and Everyday Scripting with Ruby (2007). He blogs at http://www.exampler.com/blog.

Strange Loop is a developer-run software conference. Innovation, creativity, and the future happen in the magical nexus "between" established areas. Strange Loop eagerly promotes a mix of languages and technologies in this nexus, bringing together the worlds of bleeding edge technology, enterprise systems, and academic research. Of particular interest are new directions in data storage, alternative languages, concurrent and distributed systems, front-end web, semantic web, and mobile apps.

   

1. Brian, one of the challenges that one can encounter when using a TDD approach is that maintainability of tests can be a an issue. What are your thoughts on this?

I had this epiphany a while back, which I will describe: I was using Cappuccino, which is a JavaScript kind of framework, and I was testing it using TDD, so this was all UI testing; so I had my test that would interact with user interface objects. Now the first user interface used drag-and-drop metaphor to drop animals and procedures to be performed on those animals onto one or the other of them and I had done all this testing, you know, with good behavior, as if my mother were going to look at my tests and look at my code and get mad if I didn’t have tests for my code.

So I had a lot of tests and they were all talking to mocked out user interface objects. And then, at some point later, it became apparent that that user interface was actually stupid, that the drag-and-drop wasn’t a good idea, and so the user interface changed drastically so, instead of dragging and dropping you are clicking on things and, if I remember right, whereas before you had two main areas, now you had three. Now, the results of these was that the user interface objects were the wrong ones and so the tests were wrong because tests for dragging and dropping there was no dragging and dropping.

I have this long list of tests that I have to rewrite and revisit and I was thinking "Oh, woes to me, I have to go and rewrite the test." And a very interesting thing happened as I did the rewrite. Everyone of those tests was a statement of a fact about the system that I had at one point thought was important, important enough to write down as a test and the majority of those facts were still true of the new system, yet in a slightly different way. It was still the case that if you tried to do this thing and there were no animals, something should happen in the user interface to make that evident.

The thing that happened was different but the test reminded me there was a case I had to consider. So I came to think of this list of broken tests as a checklist of things that had been important to me that I needed to think about: are they still important in the new system? If so, how? Are they not important in the new system? If so, is there something like them that is important? And as a result, that’s maybe too strong, but I fell in like with the idea of maintaining tests because the tests are this to-do list or to-think list about the changes that I am going to make.

I am here at InfoQ to proclaim a new era in testing. Back in the last century, there were 2 groups of programmers. There were the vast, vast majority of programmers, up here, who thought that testing, writing a test before writing a code, or writing a test even after writing a code was a terrible waste of time and it was below their dignity and not something that a person like them should ever do. And then there were these few crazy people, working in SmallTalk, who were doing this test for this programming stuff and now time has passed and more and more people, and I don’t know exactly where the balance is, more and more people are doing test driven design.

But if you throw away the people who are not, you look at this and divide them into 2 groups, the vast majority of them think that test maintainability, you know, rewriting tests, maintaining tests, is not something that they should do, is beneath them, is valueless activity and then over here there’s me. But in the third age of TDD, I predict that we will instead of thinking of test maintenance as a problem, we will think about it as an opportunity and pay as much attention to that as we do to writing new tests and so we’re going to see this.

   

2. One of the things which is frequently used to create tests is a mock framework of some kind and sometimes if the mock framework is used incorrectly that can impact the maintainability of the tests. How do you use a mock framework correctly to ensure that tests are still maintainable?

Well, after many years of completely misunderstanding mocks, I finally think I got them in the past year and a half or so, I am now heavily into the mocking camp what is sometimes called the "London style" of test driven design. So I’m very big on mocks. I think the two things about mocks and maintainability: one thing is that like, like all tests, maintainability problems, the maintainability problem you want to avoid is you change a fact about the system and now 3,000 other tests that whose purpose has nothing to do with that fact, now break.

The way to avoid that is to strive very much to have no word in your test that isn’t specific to the purpose of the test. I take this to extremes, in a way that I probably won’t be able to explain just by waving my hands around. If your test is not about logging in, there shouldn’t be any reference to a logging in, or passwords, or users, so that when they decide 2-factor authentication is no good, now we’re going to have three-factor authentication. You don’t have, at worst, the whole bunch of tests to rewrite, or even in the good case where you were thoughtful and sort of hide off logging into its own subroutine, you don’t even have to worry about the interaction of that subroutine with your test, which can often happen even if you don’t intend to.

So I think that mocks probably do incur greater maintainability because in conventional test driven design, tests only sort of depend on the things on top of it, whereas in mocks it can also depend on the things below it. I think mocks have a greater maintainability burden, but I am willing to pay that price for the other benefits. The potential advantage that mocks have over conventional TDD is that while they have more coupling to other objects, the objects below them, they can have less coupling to the structure of the data.

Let me give you an example: in this particular application we might have some sort of business rule that says an animal can only be reserved five times inside of a week for this particular thing. So, we’ve got 2 notions: we’ve got animals and we’ve got reservations. Now it happens in this application that a reservation is a fairly complicated structure. You have reservations. They have groups, they have uses, each use has an animal and a procedure. So if you want to do conventional TDD where you want to ask an animal "How many reservations do you belong to within this span of time?"

You have to build a reservation with its groups and its uses and the animals and you have to build five of those say, with an animal in all of them and you’ve got this data structure and then we have, we invent all these hacks to create the data structure, and we call them fixtures, or object mothers or whatever, and everyone hates those because they are fragile. When you decide to change the structure of the data that can ripple through all the tests. In a mocking system, you don’t have to create this actual hierarchy of structure.

You need an object, the mock object, for the animal and you need to tell that mock object when the test asks you, "How many reservations you’re in?" you say 5. So, in conventional TDD this test is coupled to the structure of reservations. In the mocking case, the test doesn’t even have to know anything about reservations. All it has to know about is the animal. And the thing that surprised me when I decided to switch over from TDD, conventional TDD, to mocking style, is that I had all this code that was devoted to building data structures and stuffing things and data bases and organizing stuff and so much of that code just went away.

So how do I balance the maintainability? I’ve decided that I don’t actually care, since to my mind the purpose of the test is to make me think twice about what I am doing if it’s writing code for the first time the test made me think twice. If I’m changing code, the test made me think twice and the typing that I have to do doesn’t bother me that much.

   

3. Although it can be of great benefit to have a large volume of tests for testing certain software, if you are concerned about the maintainability of your tests, and you know, you make a small change here and that breaks 3,000 tests, can your tests in effect prevent you from making changes to the code that you may otherwise make?

The more coupled your tests are, the weaker your tests are, the more tests talk about things they shouldn’t be talking about, the worst off you are, and there are really three options, I guess, in this situation. One is that you just give up, and that’s what traditionally happened. Back in the 90’s there was an earlier wave of interest in programmer testing, it was primarily top down driven by managers who were afraid that the software industry would be destroyed by Japanese manufacturing techniques, the way that the auto industry had.

So they decided that since the Japanese had people check at the point of production rather than at the end of the line, we would do that too and so the obvious thing to do was have the programmers do it. Let’s find a guy to teach these people unit testing, which I did in the 90’s. And what happened was I’d meet people that I had taught two years later and I would say: "So how are your unit tests?" and they would say, "Oh, yeah what ever happened to those?" and what had happened was they’d write the tests, they made changes, the test would break and they would be dutifully update them.

And that would happen again, and again, and then, at some point, somebody would comment out the test and say, you know, "We’ll fix this after the release" and that was it. And I even coined a phrase called the "2-year itch" by analogy with the 7-year itch. It turns out that in American culture it really is true that the peak of the divorce rate it is at 7 years. And that’s when you suddenly realize "This partner of mine is too high maintenance, you know, I keep doing these same things and I keep getting into trouble so I’m just going to throw them out and start over."

And that’s what people did, like 2 years. And so when TDD came along, and people were enthusiastic about it and then XP, Extreme Programming, was being talked about and the C2 Wiki, I sort of embarrassed myself, by coming on all expert wise and telling people like, in particular Ron Jeffreys, you know, "Oh, hi, your naïve enthusiasm is so charming, but in 2 years all your tests are going to be gone and go ahead, fail, I’ll watch from my Olympian height." And I turned out to be wrong. Ok, so, there is that possibility.

Go through the 2-year cycle forever and ever or there is the other possibility, which is, at periodic intervals, you say, "Ah, we now know how to do tests better, so we will hire somebody and have them rewrite all the tests." And that doesn’t work very well either, because it’s not a sustainable model. If you have an extraordinarily dirty kitchen, how do you clean it up? The way to clean it up is you get in the a) you get in the habit of washing the dishes after a meal and you wash one extra dish every meal. And eventually, you’re going to have the dishes clean. So, the way to get out of it is to go slower, to accept the fact that you’ve got this legacy and whenever a test breaks, you find a way to a) fix it and b) fix it in a way that makes it less likely to break in the future and so eventually you sort of climb your way out of the tar pit.

   

4. You’ve mentioned the idea of only keeping pertinent words from the vocabulary in the test and keeping out other things which are unrelated to the test, with an example being "log in". How can you do that when often times certain things need to happen to get the system to a state which is necessary for you to perform the test?

I think there are two separate issues - there are unit tests, the programmer tests. A certain amount of that having to get the system in the right state, is building up these massive data structures and you can either build them up, the way the system builds them up when you execute and then execute your own test, or you can use factories or object mothers, or fixtures or what have you to build up fake versions of them, but because, if you’re doing a bottom up unit test, where every function executes the real code that lives underneath it, then you are much more prone to get it stuck into this situation where all that code depends on all these data so you have to create all these data.

But if you’re mocking everything out below, then, ideally, each function depends on only very little data, because in reality, even in the situation where you’ve got 800,000 objects up there being built up, the test, the particular test is only referring to, or using very small pieces of that. With mocks you can isolate the test from all the complexity and just mention the very few things you care about. So, that’s part one that helps a lot with that problem because those kind of tests are going to be the majority of your tests.

Now the problem with, a problem with mocks is that you’ve tested this thing in isolation, mocking out the stuff below it, and you’ve tested this thing in isolation, mocking out the stuff below it and what happens when you put these together? So in mock tests you also need some sort of a path, an integration test, call it what you like, that traverses the system. And there’s a nice book "Growing Object-Oriented Software, Guided by Tests" by Freeman and Price that talks about that. They start with these end-end tests and they use those to drive the development of the mocking tests that drive the development of the software.

Now the problem with those tests is they have to build up these real data because you’re really executing through these. And what do you do about that? Those tests are a pain. They’re a pain because they’re fragile, they’re a pain because they run so slowly, etc. I don’t think there’s a super good answer for that because I find those tests a pain. Some things I’ve done in the past that have just been toy ideas are to sort of model things after rule based systems where you just have the tests mention what it needs, like I need a logged in user and there’s a rule that tells you how to get a logged in user.

So you just mention the facts you need and the rules generate that. And that seemed ok. It wasn’t promising enough that I’ve ever done that with any real work. So that’s one thing you could do. Two more things you can do. There’s a nice paper, now I just can’t remember which one it is, Jeffrey Fredericks or the other guy about growing a tests suite, piece by piece. And that treats the problem of growing a tests suite, the same as the problem of growing software. You start with as little as you can, you grow capabilities as you discover new needs, like the needs to be able to do things without logging in.

You do that simultaneously changing the code so that that sort of thing becomes easy. Ideally in such a situation you would have systems that weren’t so incredibly state dependent that you had to go through a lot of sequences of steps. So, I think part of the thing you have to do is you have to realize that test framework is going to be specific to your system and is actually going to be kind of hard to get it right. Just like your system is going to be kind of hard to get it right. And then the final thing you can do is you can just give up, which is something like what I’ve been doing recently.

Supposing that you’re working on a web app or doing a GUI desktop app or something that’s user interface intensive, it’s my claim that a programmer should not ever finish a story without actually trying it out through the user interface. Seems like that would be a good idea. Now the question is, if you’re doing it anyway, what additional value does the end-to-end test buy you. Now, if you never change the code again that that end-to-end test touches, automated end-to-end test would buy you nothing. The only case in which it can possibly buy you anything is if you change some of the code along the path.

But if you’re changing the code along the path, intending to change that path, you’re going to check it out again manually anyway, it’s only in the case where you’re doing a completely different path and that touches this code. It changes this code, not realizing that that code goes through it. The probability of that is greatly lessened if you have a well written code in the first place so, maybe the first line of defense against that case would be to write the code well because seems like that would have advantages in addition to making testing work better.

But what I’ve been doing is on smallish projects, I’ve actually just been saying, "Oh, you know, I’ll just live with the danger". And this project I would mention earlier, with the animals and all that, it’s been in use by a small set for over a year and I’ve never had a case where I said, "if only I had written an automated test, that bug would not have made it into the field. So it’s true for small, one-man applications. It’s obviously going to get to a point where you have a larger team, a longer lived application that’s inevitably accumulated some craft. At some point you’re going to really wish you had end-to-end tests.

The problem is here in the beginning they’re just a drag on you’re figuring out what your application should do and making it easy to change, so there’s at some plane where things stabilize enough and than you have to start writing the end-to-end test. But my personal preference is something that I learned from either Alistair Cockburn or Jim Highsmith, and I hope neither one of them is aghast that I attribute this to them, which is that the Agile style is to always do a little less than the absolute minimum you think is sufficient and then let reality force you to add on.

So I would not worry about end-to-end tests until the lack of end tests had presented me with actual data, not "this bad thing could happen", but "this bad thing has happened" and then I deal with the problem. It’s no easy answer to this stuff.

   

6. With the keeping of a specific vocabulary for each of the different systems in an application, it sounds a lot like the concept of a domain specific language. Is that what is in effect being created by keeping that vocabulary strict to each different portions of the system?

There are possibly two related questions. In Eric Evans’ book "Domain Driven Design", which you should read there’s a notion of the ubiquitous language, which is the language that everyone uses to talk about it. I think these tests, even unit tests, insofar as it’s possible, should be written using the project’s ubiquitous language. So the nouns and verbs will come from the domain. Now the question is "do you have a domain specific language?" It’s not a domain specific language in the sense that it’s a language you want domain experts to be using. I tend to think that that has been a rat hole.

We have started out in the Agile world desperately hoping that the domain experts would write our acceptance tests for us. And we invented tools to allow them to do that: write it in the tabular format, write it in English language format and that’s not really what I want them to be spending their time on. I want them to be talking to us, not writing things in tables. In my experience it’s vanishingly rare for the domain expert to actually write the tests. I’d much prefer it if everybody sits at the whiteboard and they say, you know "Oh, domain expert, what about this case?" And scribble and draw lines and arrows and you use this amazingly flexible tool, the pen, and whiteness to invent notation as you need it, rather than trying to fit it into a rather more rigid computerish notation.

So I’m not a big fan of those kinds of DSLs. So, any DSL we’re talking about has as its domain, the domain of either the technical programmer or the relatively technical tester, and by "relatively technical" I mean they know something, they know a friendly programming language like Ruby, and they can write in that language. Now, if you’re using something like Ruby, Ruby is very nicely suited to writing DSLs and I will write DSLs to make my tests kind of look more pleasant to me and to simplify things as I need them. I don’t know if I call that a domain specific language, except in the sense that like Make, or Rake or anything, those are domain specific languages for the domain of building software. You will invent some things for helping you test software. I think calling them a domain specific language is a little bit of fancy pants.

   

7. You’ve mentioned the idea of using Ruby to help make tests more expressive. Do you believe that there are certain development languages which are more suited to testing and to express the tests and some that are less suited to express a testing framework for us to write the tests?

Basically if you look at the Java mocking packages and the incredible convolutions they have to go through to satisfy the compiler, while allowing you to do something simple. The thing about testing is it’s the ultimate example of object oriented programming where you’re talking about the behavior of the object. What it does, rather than what it is or what it inherits from and the enforced single inheritance hierarchy where you can’t talk to that object unless it is a particular object is not what you’re trying to do with programmer tests here.

You’re trying to say that if the object I’m trying to build is surrounded by other objects that act in a particular way, it will produce the correct results. And having to overload the whole set of typing on top of that, just adds work. So, it kind of pales in comparison with all the work you have to do just to write a program in a strongly typed or statically typed language. But I think clearly to my mind, for testing the notion of explicit types for the compiler doesn’t buy you anything and loses you a fair amount.

   

8. Thank you very much.

You are welcome.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT