BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Brian Foote on the State of OOP, Refactoring, Code Quality
Recorded at:

Interview with Brian Foote by Ryan Slobojan on Jan 11, 2012 | NOTICE: The next QCon is in San Francisco Nov 3-7, Join us!
45:17

Bio Brian Foote holds the position of Senior Scientist at the Refactory, Inc. and has a long affiliation with the University of Illinois, Urbana-Champaign. He is well known in the object, patterns, and Smalltalk communities.

QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

   

1. Hi, my name is Ryan Slobojan and I am here with Brian Foote, self-styled software ethologist. Brian, you were a member of a panel here at QCon San Francisco which convicted Objects of crimes against computer science [Editor's note: the panel is available online: http://www.infoq.com/presentations/Panel-Objects-On-Trial]. What are your thoughts on that?

Well, it was an interesting thing to put together and we really had a great bunch of people. We had Richard Gabriel, Joshua Kerievsky, Michael Feathers; Dave Ungar was unfortunately not able to make it; Eliot Miranda wasn’t able to make it. But it was interesting for me because it gave me a chance to think back on the experience of having been, you know, part of the OO community for 25 years now. I mean, I’m one of the few people that went to has been to all the 26 OOPSLA's we just had the 26th OOPSLA in Portland, they had the first one there in 1986. And beyond that, in the early ‘80s when Smalltalk and the [Intel iAPX] 432 when the Star Work Station and what not came into being, objects just kind of changed my life.

I hopped on that bandwagon and I was really enchanted by the prospect of getting out of what was a miasma of what we now call mud, I was in a small shop in a research environment where people were just producing horrible code at a rate that will appall moral individuals, I was brought there to try and clean that up and got intrigued with, "Do we have any technology that would allow us to deal with these problems?

And so, the panel gave me a chance to take the long view and start thinking back over what had we been doing right and what turned out to be in hindsight something that was a bridge too far that wound up really not playing out the way that we thought that it would when we have the kind of zeal or that you see around things like neofunctionalism now, associated with the OO community.

And so the first thing, you kind of think back, Objects had been around since 1962 and Dahl and Nygard started working on SIMULA in 1962, so they’re almost 50 years old and I also found myself realizing maybe one of the most important contributions of the OO community wound up being something that isn’t even regarded as strictly object -oriented but it has been one of our greatest successes;. that was actually developed two years before that by the late John McCarthy in 1959.

Garbage collection has been one of the big wins of the last 50 years and Objects didn’t invent garbage collection, it was actually there before even Dahl and Nygard but it’s been one of the things that I think the community has been responsible for bringing to the world en masse, in great numbers, and to this day, it’s one of our greatest successes, it’s one of the things you can point to and go, "Yeah, life is better when you don’t have to clean up after yourself like that." You have the computer do it and that’s civilized."

Now, we also found ourselves thinking about how did this turn out compared to how we thought it’s going to turn out and you can go down the whole list of things that we thought were going to change the world, just, you know, the code that reads like English or Danish as our illustrious judge Corry called it; it never really panned out; the whole notion we’re still trying to find ways to make code communicate effectively is still one of the big battles that you fight when you’re teaching people to refactor; you’re teaching people to write code. Another was reuse. I mean, reuse was going to change the world. Wewere going to build this infrastructure of reusable objects and frameworks that would be tailored for every domain and, you know, at the panel we use the example of the Space Shuttle program and the way that, we sent that thing up there 135 times and we brought it back every time - tremendous heavy liftcapacity.

To lift a quarter million pounds and we brought back 80% of it, we brought that 200,000 pounds and do this calculation and if we had just left the hardware up there, we could have had a Stanley Kubrick pinwheel a mile in diameter but somehow the economics that the focus of the time was: we wanted to fly that thing like a fighter jock because that’s what cool space kind of dudes actually do.

And I think it was the same hubris got associated with reuse in the OO world. You know, you want to the be the guy that built this framework that everyone would worship at the altar of. So, we over empowered the fly boys with the shuttle program and to some degree we over-empowered some of the hackers. I'll speak for myself there. It was fun to be the guy who built this thing that accommodated not just what people wanted this week but next week, the week after and you know, I’m a clever guy and it actually does work, at least it works in small.

But it was one of those things that turned out to be way more narrowly applicable and way more difficult for anyone else to buy into than we ever suspected. One can, in the end, come to the conclusion that expendable boosters and just having taken the effort that people put into trying to build a lot of these frameworks and putting it into just writing the damn code, would have been time much better spent. We could have all those applications that people didn’t write because they got focused on particularly premature reuse and they could have just actually built something for the customer, to match the customer's expectations and maybe if down the road, they found opportunities to generalize it, they could have taken them.

But even then, they would have to address the issues of frameworks colliding - what do you do if you’ve got 10 different frameworks and all speak different languages. They all want to be the main programm all fighting with each other. As a result, those turned out to win basically niches like old-fashioned GUI frameworks before the GUI moved into the browser and a handful of others. Any time you make these sweeping general claims about something not panning out, of course, there were exceptions. Every part of the matrix is filled, there’s people who prospered tremendously by doing this and even to this day there aresuccessful frameworks out there but they weren’t the over- arching game changing success that I think a lot of us thought they were going to be and that’s interesting thing for me to look back on.

What was? You know, it’s fair to ask that. What wound up dominating instead:we still copy and paste. The once and current and future de facto standard technique for developing software seems to be - we take something that's working, we put a copy over here, we take the other guy’s name off it, take the copyright off and that kind of thing; and then we got something that flies and then we change it.

And the other thing that work is patterns the idea of you’re not reusing the code per se, you’re taking the design ideas and you’re reapplying them in a way that tailors them to the new environment and the way they should have been in the first place - expendable but tailored to the particular mission you might say, that would up working, that’s one of the successes.

The other great success actually of the OO era was probably one that came out of the XP world during the mid ‘90s, that was unit testing. I mean, other than maybe garbage collection and objects themselves, the thing that had the biggest impact on the way people develop is enable to press that button and see that you’ve got a systematic executable guarantee that what this test case says your code ought to be doing is there at run time, you can look at it, you don't need formalism to convince yourself that stuff is there,probably nothing I've seen over the last25 years has had a tangible impact on day-to-day development that the introduction of systematic unit testing and that kind of testing that is followed on from that has been a tremendous win.

   

2. So, for these faults like copying and pasting instead of reuse, do you think this is because of object-oriented programming or is just something that programmers do and they happen to be doing object-oriented programming at the time?

Oh, it’s something we’ve always done, you know, that’s the way. If you wanted a working application, in the really old days, old school guys like me might start with line one instead of copying something else. But once frameworks came along and things got past a certain size, I think we all know that what we do is we take something that's working, we kind of beat on it.

Some of us are more conscientious about doing it than others but I think that’s been the de facto standard way of getting a pulse, getting something on the air for a long time. You know when I teach classes, I’ll go into a room and say, "How many people think copy and paste is kind of a naughty thing to do?" And the hands would go up and I’d say, "How many of you have done it in the last week? Everybody’s hands will go up and I’d go, "Of course, we all do it." It sounds like high school health class. That’s it people, this is how we do this.

And one of the things which encouraged me over the last couple of years is you’re starting to see tools that make it easier to do that. And one of the answers of the mud phenomenon is if there’s something people are doing, let’s get better at doing it ‘badly’ and one of the things that facilitates that is generating something that has pulse, out-of-the-box that you can beat on. We used to have to do that by hand. You’d find the code out on the streets in a dumpster or some place. Now, we got the internet. It’s easy to find something that’s got the pulse and you just kind of look at it and go, "That’s the kind of like what I want" and I can systematically change it. That’s actually getting easier and that’s probably fortunate because it’s what people did all along.

   

3. So it seems that there were a lot of successes and it’s a bit unclear if the faults which were attributed to Objects were really their fault. Why do you think it is that objects got convicted?

I think there’s a certain amount of OO ennui, you know, it’s not clear that they got all that fair a trial to be perfectly honest. We had a few prosecution witnesses and we did bring up. Certainly there’s a lot of hype and the beginnings of any of these kinds of movements, you see a lot of hype associated with this kind of things. You know, there’s no question that objects have been successful in terms of having gotten a lot of studentsPhDs and a lot of professors tenure and a lot of consultants a lot of money but over promising anything upfront is a good way to generate disappointment down the road and I think anytime anything that’s oversold as objects did during the ‘80s and you can look back and go, "Gee, why are we still struggling with all these kinds of things?"

And they go back to Brooks as we all do inevitably. We've picked a very difficult line of work - the life we have chosen is one in which if we ever get good at building things that are as intricate as the market is demanding it and given time, it’s like what happened with the CPU guys. Eighteen months later, they had to be twice as good as anyone on earth.

You know, these days, you find programmers wrestling with issues that are just unspeakably intricate compared with the kinds of things that people grew up with in my day. And they’re coping with integrating things that are more complex and never have the opportunity to read. So, it’s a different way of approaching things and so, as these things have gotten more intricate over the last 25, even 50 years and the challenges got more demanding and the degree to which one can be disappointed that there wasn’t the panacea 20 years ago increases anytime you have to struggle with anything like that. So, I think it was one of the things that caused them to get kangaroo-courted here the other morning.

The other is that there is a certain amount of object ennui I think that resulted out of the fact that basically they kind of succeeded. I mean, you can make the case, look in the middle tier; the middle tier objects rule. You know, there’s a few people using scripting languages to work the middle tier for really small things but basically languages like Java and C# and Ruby dominate the middle tier and you when you get to the browser and arguably, you have to say, the greatest success story of the last ten years has been Java/ECMAscript and the stuff that’s doing in the browser these days is just utterly remarkable as far as I’m concerned.

I saw this video demonstration at SPLASH a few weeks ago that just floored everybody in the room and as a result, you’re seeing this trend towards JavaScript being the assembly language of the 21st century. There’s all these new languages coming out that are compiling right down the raw JavaScript. Generativity seems to be part of the Zeitgeist right now and it’s because of JavaScript because there’s no byte codes there. You will have a talk in the morning from the Dart guys and there's things like CoffeeScript that are getting a lot of airtime. There are dozens of languages compiling directly to JavaScript; and part of it is a Moore's law bounty, part of it is we’ve got, you know, cores are becoming abundant and people are putting a lot of work into the engines on the browser side, and so that’s another success story.

Now, on the back-end unfortunately, the news isn’t as good. Objects lost, Ellison won, the database people, relational has and always, you know, will continue to havethis impedance mismatch as it has always been referred to. And so, there’s always been a little bit of extra baggage in getting down from the frameworks through something like Spring and H into some kind of object relational mapping and then takes you into a realm where all the things that people run their hands over like encapsulation boundaries and protection don’t exist. You know, it’s fascinating. To think of what might have happened if one of the OO databases had actually done a slightly better job of getting market share, but we’re now stuck with that. And so, that’s arguably a failure.

So, I think part of the sense of failure is motivated by the fact that we live in this balkanized world of multiple languages, multiple technologies, multiple specialties and if all expanded on the typical exponential curve in a way that anyone individual probably is overwhelmed thinking about, you know, trying to master all those technologies. It’s really a humbling thing when you think about what it takes to really be astride everything that is happening now. I think that leads to a certain sense of disorientation sometimes. And I think there was a naïve promise that things would be might simpler if objects had won every step of the way and there had been less balkanization and so there’s an elegiac longing for a world that never was which is one - a simpler world in which objects ere everywhere. It’s certainly a naïve vision but it’s kind of a pretty one, people used to believe it.

   

4. You mentioned object databases and how they had failed in the ‘90s and how relational databases are currently in the big way to do it. There has been a resurgence of interest in NoSQL databases which are another form of object databases in many respects. Do you think the time is now ripe for them to take on relational or are they good for a particular applications or good for no applications? What are your thoughts on that?

Any time you make a sweeping generalization, you then talk about the exceptions and naturally, it’s kind of gratifying to see things like Mongo and Couch and those kinds of things that were inspired by a somewhat a more structural or arguably some quasi-object-oriented outlook on the world. And actually you see the observation that a whole lot of applications which never really needed a database in the first place if you can put, you know, 16-32 gigabytes of memory on servers and find a way to replicate things are moving away from reflexively deciding that they needed to go with Oracle and the huge simplifications out of avoiding going through all that baggage if your application doesn’t need it. I think everyone knows that poster child because that’s Map/Reduce and the big data stuff that’s happening in places like Google.

And one should certainly expect to see more of that and I think it’s one of those situations where those who will challenge the monopoly will see how far some of those approaches get certain application domains but they all are going to flourish. This new stuff will go up alongside the old and the people that are still building the big abacus that the man uses to count his money will still like the relational database and liked the fact that transactions are sealed for sure so that their money is okay and these things they all have their purpose.

   

5. With respect to objects, the question that came up during the panel as well about static and dynamic typing. There’s also optional typing which is something which has come up in languages like Dart. What are your thoughts on optional typing?

I think optional typing is a very clever idea and my hat is totally off to the folks who have enough brain power to be able to engineer these systems. I mean, the type inference, the formal mechanics that underlie being able to make that work are exceedingly impressive. But then you start to ask yourself the question: what are they succeeding in solving?

Now, disclaimer here -I’m an old dynamic languages guy, cut my teeth on Smalltalk during the ‘80s and you know, my attitude towards static type declaration was best summed up by that clip that we used in the trial the other day.There was a clip in Dave Chappelle’s Half Baked a few years ago where Willie Nelson was sitting in the gutter,he’s an old stoner and he was reminiscing about the past and he said to Chappelle, "Do you know how much condoms cost in 1967?" and he laughed and said, "Neither do I. We didn’t use them." And I hear that quip and I find myself thinking, "That’s exactly how I feel about static type declarations."

And so when someone says to me, "We’re going to add an optional type system to this language, I start getting flashbacks to what happened with Generics. What happened with Generics was, well, you can use them and we got this real pretty little for statement that we’re going to give you if you use them. But we’re also going to make the compiler nag you and your colleagues if you don’t use them. And so what happened was the types and the way Generics turned into this thing which became mandatory by virtue of the compiler rapping your wrist and just about every shop I’ve seen where they've had that option has turned to putting them in to get the compiler to shut up. And my worry is that if you just dangle these things before a lot of people in a lot of shops, it’s going to be a slippery slope down towards forcing everybody in the organization to do full type declaration - that’s my worry with optional types and then it will become like ‘No Smoking sections’ you know. If one person in the shop is using type declarations basically, everybody else is having to deal with the second hand types.

And once you make that possible, once you make it an option, it turns into a slippery slope. So, we could have that little bit of history repeating itself. Now, the counter argument they will make is, if you’re doing prototyping, you don’t have to put those in and if you’re in a shop where you’ve got people who can’t figure out what the author was doing and would like being reminded that that variable has that type in there, please help them out. You know, I think it’s a cause for concern and as impressive a piece of engineering as it is to get that to fly, I’m not sure if it’s not undermining the benefits that people who otherwise never really experienced them might have otherwise been able to get out of dynamic languages. I mean, you can do just fine without them in a lot of applications and I hate to see their optional availability undermining the ability of people to learn that.

I think the excitement over that kind of thing is part of what’s driving the neofunctional renaissance actually is, you know, after having been in type pecked languages where teams are kind of slowed down by all the bookkeeping that they have to do, I think one of the things that’s made languages like Clojure and Scala and F# popular even though they do play some of these type equivocation games. Are we experienced in the excitement of 10e0 team size and actually getting something to work in the short period of time? Not having to fret over names, not having a lot of baggage in their code?

And, you know, I have been trying to understand some of the zeal associated with that because a lot of the stuff that that they’re doing now is the stuff that the Actor and LISP people people were doing 20 years ago. But I think for a lot of people, that’s the first time that they have experienced that and getting the buzz of being associated with a paradigm shift and of actually getting something to work on a short period of time, I think we all know that’s exhilarating, wherever we all got to experience that for the first time, it’s great and I think a lot of people are seeing that now from some of the neofunctional languages.

   

6. So one of your main areas of interest is balls of mud as it relates to software quality. So, how much does code quality matter if the product is successful?

You know, one of the things - it makes one deeply schizoid to discuss the code quality issues, the whole mud issue and it’s something I’ve been wrestling with for years because during my misspent youth, I was brought into a shop where the code was absolutely horrible and my mission was to try to make it be some place that I could stand to hang around. And the biggest mistake I ever made was to succeed at that. It is possible to take a festering metastasized ball of duplication that’s tens of thousands of lines long,you consolidate it if you’re a real computer scientist in a shop full of say, physiologists, as I was. As aresult of that experience I think it’s where I got some of my misplaced faith in frameworks and what have you. I mean, I know it can be done in the small and I know that your life is a lot better if you’re working with a codebase you can comprehend.

Part of this is an issue of scale. I mean, you’re talking about ownership and authorship and the capacity to have some control over your life and the environment that you’re in. And when people can do that and people can cultivate their code - they have a sense sense of esthetics about it, they get it to communicate - it’s a much better place to live. I got the unusual experience, I think, as a result of that of having the latitude from my employers in those days, to run with that to be there for a really long time and it’s one of the things that’s exceptional about my background is that I got the opportunity to live that and as a result, I didn't leave for a large number of years and it put me through graduate school.

So, it is possible to cultivate code that is basically whatever the opposite of sucks is and to keep it that way. I know from my own experience one can do that and as a result, I started looking around the world and seeing what everybody else was doing, I was kind of appalled to realize that just the overwhelming de facto standard was that stuff just wasn’t any good and people were just sucking up and tolerating it. And you can work that way for a short period of time. You can sprint out to a certain distance blitzkrieging your way, line-by-line, copying and pastingwhat-not to a certain point and then you exhaust your detail capacity. You just can’t cope with it anymore. You start introducing problems faster than you can fix them and you’re starting to realize there’s a certain amount of complexity you can comprehend and you are there and you grind to a halt.

You know, it’s like that scene in Patton in World War II where they ran out of gasoline and suddenly the tanks can’t move and so what do you do? You start digging in. You’re in the trench warfare at that point and you’re knee-deep, hip-deep, neck-deep in the mud at that point then you’re kind of stuck there. Every indication is that the prognosis once you get to that point is fairly bleak. You know, I have been hunting around for success stories - stories of people who have gotten to that point and the code has gotten better and you don’t hear a lot of them. It’s a rather ominous finding but, of course, anytime you make a statement like that there are exceptions; there are codebases that are getting more eyeballs on them and enough investment.

Anywhere in the universe where you entropy if you introduce enough energy you can reverse it. I think that the big problem with that is the way that we finance software development by and large and the way that we manage the development process does not accommodate attempts on the part of the code that has gone that far to seed to get better. People really don’t want to pay for it if it’s still delivering value they will continue to ship it. If it has problems or needs new features, they will add them incrementally. So, you keep on slathering additional band-aids and shovel lots of mud on to the existing stuff. It’s a demanding task once it’s too far gone to get better. And the prevention is much easier than remediation. And unfortunately you don’t hear enough about people doing either.

   

7. I am reminded of the example of Netscape Communicator and I remember using this when I was younger because it was the big web browser at that time and the Netscape Communicator code base was donated and made open source to the Mozilla Foundation. And the Mozilla Foundation, after spending months digging through the code said 'Screw this we’ll start fresh'. And it does seem like that’s the most common choice when you have a really big codebase that has reached a certain level of pain and it just becomes so high maintenance-wise that you’re no longer able to effectively develop features and your time is spent on maintenance.

That’s part of that matrix. When things get totally to seed, sometimes the impulse is, "Oh, my God! I've got to work on this thing, I don’t understand it. I don’t know what’s happening with it." And Mozilla is a great example because I happen to recall having spent time in the University of Illinois the original codebases for both Mozilla and what turned into Explorer came out of a round windowless room in Urbana, Illinois, where they had developed the first web browsers. And they were basically C code buckets of glue and they were integrating a bunch of existing applications that did things like draw pictures and did various other manipulations and the code was by everyone’s by general acknowledgment, not at all very easy on the eyes and as those applications became successful, it just got worse and worse. I recall seeing the term ‘ball of mud’ applied to the early Mozilla codebases when you could start examining them on the web.

And so naturally when you're in a situation like that, you know, hackers are smart guys. You really can’t be in this racket without having, you know, more IQ points than might be good for a lot of people and they look at this kind of stuff and go, "This is disgusting, I can’t figure it out. There’s too much of it. I’d never going to read it." I did this thing the other day where we were looking at, you know, what it would take to read even a modest amount of code, like, 250,000 lines and if you printed that 50 lines per page, you have a listing a mile long. So, one of our dirty little secrets over the years is nobody reads it - like web blogs, communications of ACM - you pretend to read it. Yes and End user license agreements.

So, if you got that much rubbish, either you can do the thing that is really intellectually demanding and try to understand it or you can say to yourself, "I’m a smart guy, let’s just rewrite this from scratch." The problem you got at that point is the code embodies a series of decisions and design observations and requirements that you probably don’t have in front of you anymore. So, there’s lots of hubris that goes into saying, "Oh, we’re smart enough to just rewrite this thing." It takes a lot of effort to recover all of those things and depending on the application, you can do it. I think the fortunate things about web browsers is they were still evolving, they were relatively well understood; you know, you could ape the other guys and yes, obviously I think we all know from discussions people have with the state of web browsers to this day, there’s a first approximation, Mozilla may have gotten better for a while but there’s still a lot of people who use the word ‘mud’ to talk about the way that some of that stuff emerged.

I mean, there’s just so many things that we’re growing so quickly that there was a bit of metastasis and that kind of stuff happening with codebases and they’re not poster children, I think we all know that. And people like Spolskysay, starting from scratch one thing you should never do and the reason he saysthings like that is recovering all those previous assumptions; however, they’re hard to get out of the autopsy that you have to do on the previous codebase as well. It’s a very tricky game to want to do that.

So, one alternative you have to writing it all over from scratch which is to say, you know, ‘dropping the bomb on it over here and rebuilding the city over here’ is you can try to gradually gentrify or make over the existing code base. You know, a piece at a time, you can go, here’s a section we can kind-of cordon it off. We can find a facet or a module boundary in it and we can focus on, as Mike Feathers would say, ‘getting this part under test’ so that we can make changes to it without going insane and breaking it.

Any time something doesn’t have a pulse for more than about five minutes, you have the risk of it becoming braindead and code is like that. You want to make sure you’ve got a pulse because if you’re going to do something to kind of change one of its internal organs, you want to make sure you didn’t kill the patient and so, a technique like that applied systematically over a long period of time can also have the effect of doing a de facto incremental total make-over on code and there’s probably a lot of cases where that would be the strategy that one would employ.

But in a lot of cases and I had somebody ask me, "I’m in the shop where we got this disgusting, festering completely hopeless ball of mud, what is the first thing I should refactor?" And I told them, "Your resume."

   

8. That’s interesting for a company which has a product that had a certain level of devolution, what options do they have? This effort to redo portions of the code base, is it worth it? At what point do you just basically declare a Chapter 11 and go try something else.

Part of what happens in those cases is if the thing has gone so far to see the people can no longer maintain it and remediation is not effective, then you basically go Chapter 11 and the eco system makes your competitor, you know, the new incumbent. That’s the way that the gene pool does it. It generates a lot of phenotypes and as soon as one reaches the state of senescence where it can no longer function, you know, the grayback is deposed and the successor comes in.

But beyond that, if you can successfully institute some systematic practices for doing a gradual make-over of that sort or maybe a more draconian make-over or part of it, you might be able to keep it on the air. And there’s also a tactic whereby if there’s a festering ball of mud in the middle, it’s making you a lot of money, you kind of do what they did at Chernobyl - you cordon the things off and you say, "Anyone who touches a line of this is fired. And if anything needs to be done, to it you kind of put wrappers and adaptors on it on the way in and the way out".

We had a client who did precisely this with his rainmaker piece of code. It’s an old legacy piece of C++ code, nobody can stand to look at it. They basically had to put it in a sarcophagus, so you’re fired if you touch it but they're still shipping it. I mean, your odds are, you’re using it and anything more than that would give away the client. That’s an approach you can take.

There’s a lot - every way you can imagine of surviving, you know, bad code is out there : the make-over, the cordoning it off, the gradual gentrification or just throwing the human wave, throwing people into the breech and just burning out their detail capacities, just managing to keep that thing at the edge of collapse for as long as they can. I mean, you can put energy into something as bad and people will find this spot where they got to add the feature and suck that up.

In fact, there are people who are really good at that. You know, you get perverse incentives in an ecosystem that basically looks like the everglades, there are these people I call ‘swamp guides’ who get really, really good at finding exactly the problem in a legacy code base that everybody else is too terrified to step into. I know some consultants that actually rack up some pretty good stud fees going in there and fixing stuff like that and they have a good time talking about what idiots the people that let it get this bad are. You know, it’s a way of life - driving around your swamp buggy and dodging the alligators.

   

9. When you’re trying to decide which portion of the application to segment, assuming that your codebase has got to a certain point and you’ve decided you want to invest the money to try to make it better rather than scrap it and restarting. How do you identify the part which will benefit most from it? Can you do that or is it like the situation where somebody looks for the keys underneath the lamp rather than where they lost it because it’s easier to look under the lamp?

Both, I mean, you see both . Sometimes there’s low hanging fruit and then you go in there and you go, "May I clean this up?" This will get us to the point where we might be able to do something. I mean, one would presume this is usually driven by there’s either a bug or there’s a need for a new feature that requires us to find a place in this code where we can successfully intervene and incorporate this.

And that’s where basically you can sort of graft a new well-made maybe TDD-driven kind of nugget of fresh stuff and then find the place where you can transplant that into the existing codebase with minimal intervention and you know, gradually replace one organ at a time as there is a need for these features. The answer I would expect anyone to give to that question is: find the place where there will be value associated with intervening and go there and make that happen.

But the part of me that knows I tend to get off the reservations will go, "I spotted a place where there were natural fissures." I might put some effort to making that happen. I’m a big fan of, if you discover something that is kind of low-hanging, just make it happen and sometimes, it’s amazing when you look at legacy codebases just how much low-hanging fruit there is and manipulation you can do in most shops where you walk in and go, "Oh my God, 600-line method." You just go and you know nothing about their domain and you think you can start going there with the refactoring tool and say, "What does this do?" and they’ll say, "This is the thing that kind of gets us over this particular hump by doing this with this data over here," and you go, "Fine," extract that and call it that. And they go, "That’s ridiculous." "Don’t worry, we’ll fix that later."

Then you do the next 20 lines and they start to realize you’re getting this libretto that describes what this monster data all they’ve been afraid of going into and once I started getting the hang of everything else, then they go, "Get that one." And then they realize they’ve just used the extract method to create a method with 65 parameters. Suddenly, they’re realizing maybe that’s a data object and there’s a little bit of primitive obsession here; maybe there’s some opportunities to refactor that.

Once people get rolling some of the stuff, it can happen quickly, notalways. I mean, each case is different, you know, there’s just a whole lot of clinical variation and the pathology that leads to this stuff and it’s hard to generalize which doesn’t stop us.

   

10. I’m thinking back to the idea that even a small to medium size codebase now is very large and a mile long.

A web app is probably a mile long, a quarter million lines is a good place to start, a lot of them are a lot bigger.

   

11. If you’re not able to read the entire application understand the whole code base, how do you choose which is the best part to start to focus your efforts on - if you want to focus on the thing which seems like it will give you the most bang for the buck ?

I have written about how do you approach something that’s monstrous and maybe doesn’t smell all that good and initial reconnaissance is one of the things people recommend. There's a pattern in "Reengineering patterns" by Nierstrasz and some of his colleagues a few years ago and we said, "Read it all in an hour." And of course, you know it’s like reading War and Peace in an hour, the old Woody Allen joke: I took a speed reading course. I read War and Peace. What was it about? It was about Russia - that’s all you’re really going to get out of it and you might recognize what the major landmarks are and the next time you go in there, you might see a bit more.

For me, navigating a codebase has always been like learning a codebase is learning a new city. If you have the experience of going to a city where you don’t speak the language which, for an American, it’s a very common thing because we’re aggressively mono linguistic and you try to navigate it. First, you’re timid and you just try to figure out where the major arteries are and you start to learn the landmarks and you start to get more confident and after a couple of days there, you’re starting to realize where things are and you can be effective at getting certain chores done that had scared you a few days before and that initial reconnaissance is a good thing to do and discovering where you need to go to get essential things done and so, something you can do.

I mean, it is possible to survive life in the everglades; not a lot of people do but people do and it’s possible as you do that to learn how to make it over. A foreign seeming place with lots of odd traditions that, you know, doesn’t collect its garbage very often, you’ll learn to navigate them.

   

12. Typically there seems to be a resistance on the part of the product owner, the business owner, the person who is in control of the money and who wants to drive the direction of the product to spend time on things that they don’t think contribute directly to the product itself, like, they would like feature A, B, and C , they don't necessarily want to worry about this stuff. How do you have the conversation with them to ensure that you have the time to deal with this code, to deal with this refactoring with this clean-up?

Yes, how do you sell it? And unfortunately, every spot in the matrix of ‘we cared and we did it, you know, in a craftsman-like fashion and we managed to survive for years and our product is still shipping because it’s maintainable and you can work on it and our people are happy and then they didn’t leave'. Or 'We went ahead and we built a ball of mud and we beat that other guy to the market because he was still lily gilding his code'. And we have the other two combinations too. You know, 'We built a ball of mud, we went bankrupt immediately', we tried to do it right - we went bankrupt immediately', it’s all out there.

So, I’m not sure what you’re telling someone who is placing those kinds of bet other than if you really think this is going to be around for a while, you know, the a sustainable model might be something that you want to think about. It means he or she, they’re employing a commando model, I think I’ve already alluded in the fact that you kind of like people who care about this kind of stuff and who know how to do it in order to go down that road.

It certainly means you’re not employing the infantry model. You’re not doing human wave. This is much better way to get mud. But you’re kind of making a bet that you’re going to be around long enough to glean some of the benefits because it’s not clear you’re any faster out of the gate than people who just pedal to the metal start mudslinging. But there’s some evidence out there now that people who manage to refactor their code, get it more modular actually see lower change costs than the groups that haven’t.

They’re hunting around for empirical evidence on some of the stuff and they find that there’s a group that (I think it’s Harvard) Carlos Baldwin who did some research involving change cost and code bases that have been refactored and was able to show that it had actually impacted there.

Unfortunately, there's some empirical work I have come across lately from people at North Carolina, people at Illinois who have been studying whether or not refactoring is making a difference in people’s development processes and I think they're discovering all I ask is it doesn’t seem like people are using it that much. You know, even the developers of Eclipse did not seem to be using Eclipse’s refactoring tools as much.

So, they bring things full circle. It seems there’s a low-hanging fruit in most code bases and there are small things people could be doing to get substantially better fast. How that bears on something whether or not something is just hopelessly gone can get better and how quickly it can get better and whether that’s even possible. It’s an interesting issue and it’s a case-by-case kind of proposition too.

   

13. It seems that both business owners and everybody that has worked on software is notoriously bad at identifying at which things are going to be around long term and which things are going to be very short term. I mean, it’s something that I’ve done myself, "Oh, I’m just going to throw this script together. I’m not going to need it for a day or two and five years later, it’s still in production. How do you do this? I’m wondering, can you identify these things in advance or is it more a matter of as you recognize things are there for longer then you start to spend the time to do it?

Obviously the latter. When I first got into this racket, I thought I was, you know, clever enough to speculatively generalize and it was a lot of fun because you could really elegantly solve these problems and as it turned out o one was every going to have. And it was a great, heady stuff gorgeous solutions to the next nine problems my clients would have had that it turns out none of them ever had. And naturally that was where I spent well over half the effort and it was lots of fun to engineering that stuff. But, you know, one of the things , when XP came along Kent Beck did a nice job pointing out that you’re right half the time, amortize that over a certain period of time and you just wasted a lot of potential future value and so, I honestly and humbly came around to the recognition that if you think you know where things are going , okay, smarty pants, keep that in the back of your head and the second or third time you have that problem, then go ahead, make the move - go ahead and refactor.

You don’t have to have a lobotomy - you don’t have to pretend that you don’t see it coming. It’s just keep your powder dry until you actually get a second or third instance and then go ahead and refactor in that direction because as often as not, you do some of these speculatively general things the premature generality and the requirements would shift but they shifted in the direction, you know, which forced you to have to desecrate the cathedral of elegance you just built.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT