BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Michael Feathers on Programming Languages

Michael Feathers on Programming Languages

Bookmarks
   

1. I'm Sadek Drobi, I'm here with Michael Feathers, at QCon. Michael, can you tell us about yourself and what you've been busy with lately?

Yes. I am a consultant. I spend a lot of time traveling around the world helping teams with various issues. Primary thing really is dealing with people having large existing codebases that don't really have much in the way of test around them. A lot of this is coming from the fact that I wrote a book about the topic Working Effectively with Legacy Code, back in 2004 and I did that mainly because I was trying to help teams move to test-driven development and we discovered that there was a lot of existing code that we want to have test around and so I started writing some techniques.

I found "If I write a book about this, then I won't actually have to go on do this anymore. People will just read the book and then that's it!" Of course, it doesn't work out that way! You just get called into going out to see somebody's code you can never imagine. So, I've been spending a lot of time helping teams out with that and also a lot of time thinking about programming in general and more recently, functional programming and education in computer science - things on those lines. I guess that's where I am today.

   

2. You are known to work a lot with Legacy Code. How does it differ from working with code from scratch?

It's a simple question. I guess anybody who's been programming for a while is bound to see what the differences are. In general, I think the biggest issue of any code base is understandability, the ability to walk in and figure out what's happening with a particular piece of code you are dealing with. To me that's the primary issue - a lot of large existing code base is that you can't really see intention in the code base all that well and when you can't, then invariably you want to go out and do something to figure out what's going on with that code, how it really works.

Writing tests is a valuable thing to do to start approach that. I think that's the key. To me, the quintessential quality of a good code base is understandability and once that falls apart, then you are in trouble. The thing about writing Greenfield Code is that, essentially, you are in charge and you must understand what you are doing, if you are actually writing the code; at least that's the way we hope. You have that time now that you can use to understand the problem and write code that is understandable later, when you are working Greenfield.

   

3. You are also known that you change the technology often. You program with different programming languages, different runtimes. What is the first thing you focus on when you change the technology of a programming language?

It's never like a decisive leap, in essence. I am very much a person driven by curiosity. There are times when I'm called into a particular situation with particular clients and they say "Well, we are working Delphi" and I have to learn some Delphi and dig into that, but for the most part, I'm driven by curiosity. I wake up in the morning and say "There is something I'd like to investigate in Ruby, so I'll code something up in Ruby" and then, invariably, I pick up the roots of various languages and do a lot of reading and try to dig into various languages as in code driven to them. I've always been very curious about programming languages from the very beginning when I was first in school. Actually, the first job I had after I left school was to design a proprietary programming language in HAS for a company and that was really a heavy experience. It was fun.

   

4. When you are faced to Legacy Code, you start building around it some non-regression tests and so on. Are they unit tests or integration tests?

The biggest issue that you typically have when you are going to get test in place around some existing code is determining what the scope should be. Sometimes you have to be very pragmatic about that, based upon what you are confronting in the code base, because in some code bases the dependencies are so tangled that it may not be easy to bite off a bigger chunk with what you are working on. The typical thing I do with teams is that we have some very complicated speghetti-ish code and we have some changes we need to make, we first take a look at the code base and figure where we want to make the changes.

Then we basically try to figure out "Do we need to go and cover with tests in one place that covers these replaces we need to make changes or is that too difficult? Shall we try to go and cover each area independently with tests and just enough so we can get those changes in place?". I don't now if there is any real algorithm to that, but you do go back and forth between trying to write more unit-y tests and trying to write more integration-ny tests.

If you have a good place in your code base to go and write test at a relatively high level, they give you coverage around the areas you need to go an make changes to. It's really a great situation and you can use those tests to support you as you are making changes and refactoring and what not. Over time, though, they tend to be a decreased area focus, often with teams I work with, because I try to encourage them to work at the lower unit level, whenever they are making changes. First, the closer you are to where you are making a change, the more you can use the test to help you visualize what you are really after. I think that focus is very valuable.

   

5. When we are programming, actually we try to break the problem into modules and like in different programming languages, there are different levels of these modules. Can you talk for us a little bit about the different levels of modules and the testability of these modules?

It's a very strange word - "module" - because it's been used for so many things in software development in so many languages, like Ruby modules are essentially a mix-in and, in the old days, when we had Pascal and module, you had this bigger modular construct. In some languages, people see classes as modules. Your question was?

   

6. You got several conception constructs or abstractions and you program with different programming languages. What is the testability? You look at things in a testability perspective, for example you have classes and interfaces in object oriented programming languages and you know how to test this stuff. But, for example, in functional programming, it's a function. You talked about seams once to be able to plug test in the place.

In each language you have to find the points at which you are going to write the test. The module constructs can either help or hinder you in that. I haven't done a lot of testing in functional languages, I've been playing with Haskell a bit. It's interesting as it seems like the module constructs that you see in some of the modern functional languages are more akin to the types of constructs we saw in structural programming languages years ago.

Then, we got this big middle ground in object orientation, where you have the notion of a class, but you don't really have anything about that, except, perhaps you are talking about packages or assemblies in Java and .NET respectively. I think the module constructs are all over the map in the writing of different languages and I just take them as they come. The main thing that I really wish, from a language design point of view, that more language designers thought of, is how to make some construct which is bigger than a class, but represents the work that a particular team is working on. The reason why is because often I find that we get in trouble a bit with testability when people try to use the constructs in a language, which kind of reduce scope and visibility, to give them some fine green encapsulation and that's great.

But, sometimes, that can work against testability in a sense. The perfect world is to go and say "If we have a team of 10-20 people, for instance and we know that they're working on these 5 areas of code, can there be a tighter access control in our interfaces between this stuff and the other stuff than there happens to be inside of our particular module." Could we say, for instance, that like almost everything is public in a sense and easy to work with and extend and test. I think we get in a bit in trouble when we nail things down and the people who consume our things can't un-nail them if they have a need that we don't anticipate.

   

7. More and more we see languages offer more concise code. Does less code always mean better?

It's never like that game golf you can play, it's like write the shortest program that you possibly can. People do that in Pearl, I guess. It's a funny thing: I'm a strong advocate of removing duplication, but I think the important thing is to be able to name the pieces well. You can get a lot of benefit out of compressing a big swath of code down into much smaller pieces, but there definitely is a conceptual trade off. It is often a bit more work to understand the small pieces.

In your team, you have made a commitment to deal with that conceptual overhead and understand what kind of benefit it's giving you in terms of the composability of the pieces and reuse in a particular area of your code. Much of this is a matter of acclamation. People are quite used to going and seeing a strong sequential focus in their code and knowing that "If I see this appear, I know this happened before that and that happened before this" and see it all at once in one view. Breaking things into small pieces, you have to think about the code a little bit differently and I think that's something that takes people time to do.

   

8. Having worked with a variety of large and ugly code bases and having worked with a variety of languages, do you think that having a multitude of languages in a code base helps or hinders the readability and learnability of that system?

I think there is a bit of a spectrum. I really haven't encountered too many large multi-language projects, except the typical web sense where you have Java script and such, but, in terms of seeing 2 or 3 mainstream languages mixed together, the times I've seen that has been a bit of an issue for people who jump from one area to another. It's not yet typical for people to go and develop strong in depth experience in the couple of different mainstream languages and go back and forth with these. I can say I haven't seen that all that often and the times I have seen it, it seems to require a little bit more skill than the teams were displaying. Since I haven't seen much of it, there is no way I could say that's representative.

   

9. What's your opinion about the static versus dynamic programming languages regarding also maintainability of code?

I like the dynamic ones. I think they are fun and interesting and there tends to be a great deal of fear around them, which is unfortunate, but I also see how static can be very useful. So many experiences I've had working in Haskell a bit, is underlying to me how useful some of the stronger functional static typing systems can be. I think it's really a matter of taste, in a sense. It's valuable for people to get used to working in both environments. One of the more unpleasant things in working with statically typed languages is that, invariably, the checking has some cost at build time.

Of course, that introduces this whole other world of managing dependencies well in large applications and it's something that people don't spend enough time thinking about. Static typing will force you to think about that because if you don't think about it, you start suffering rather aggressively because of it. That's a two-edged sword also. One wish I had for the industry is that more of the people who feel a bit prejudiced against dynamically typed languages, if they would get a little more experienced with them and realize that it's not that bad. It's just something else, a little bit different you need to work with.

   

10. Talking about also statically typed languages, there are a lot of discussions about like checked exceptions versus Runtime exceptions. There is an argument people make that removing checked exceptions means dealing with this kind of cases until deployment time or production time. What do you think of that?

It's a strange thing. I wrote a chapter for Bob Martin's Clean Code book and I kind of outlined that everything was very contentious - just a feeling that checked exceptions are pretty much a failure, a language experiment in my point of view. When you look at it, it's like Java has checked exceptions, but .NET doesn't, Python doesn't, Ruby doesn't. It's interesting to see that people do very well in the .NET world without having checked exceptions, so we can't really tell ourselves that they are absolutely necessary to good software design.

The bigger issue I tend to have with checked exceptions is I always look at them as being a violation of the open-closed principle. There is this notion in software engineering that, when you create a module it should be easy to extend that module, but you shouldn't have to modify it. Think about the case where you have a checked exception - you are going to throw an exception here and you want to catch it over here. If you change an exception that you throw, you have to go to every intermediate layer and basically alter the signatures in order to say "Look, this thing can be propagated up to this particular area of our code." I find that that's awkward in a sense.

   

11. Is it like returning the string instead of an int? I mean you change the contract of the method in some way when you change the checked exception.

Yes, you have. The interesting thing about exceptions is that they are a way for us to decouple the detection of errors and their recovery. If we do have to deal with intermediate layers, then it's like we are not able to do the full disconnection when we have checked exceptions. Occasionally, I run into people who say "Yes, checked are great and I love them" and then you have to say "If you were working in a different environment, would you feel the same way in an environment where you wouldn't have them?" Sometimes we have to try things out in a different environment and ask ourselves then "Is it just familiarity?"

   

12. Recently, I've been involved in a project that involves programming in functional programming languages - statically typed ones - and I had the observation that everything starts to work almost in the first try. Do you agree with this observation? Do you think that functional programming can bring more correctness to programs?

One thing that's very interesting about functional is that it's really hard to program by accident in functional programming. I don't know how much of this is new to me and is new to many people in the industry now. You can't say "I think I am going to add this and find out what happens". You have to basically articulate the next step in your computation. If you want to solve this problem over here, you have to think what's the intermediate form you need to go on target, to move forward in a way.

So, it does force a lot of deliberation on you and it is true that while these newer type checked languages, the type system itself is something that you have to conform to. You have to think quite a bit in order to make sure that you are going to conform to these things. I've heard that a number of times people saying that "In Haskell I know that once the types checks it's good and I'm set". That's good, but I also think that that's generally true of everything else that we do in the process and tools base that forces us to be more reflective and more conscientious when we are doing something. It's yet another tool that goes and gives us something in that direction. Test driven development, in particular with a language like Ruby, for instance, forces that same degree of reflection conscientiousness, as we are working.

   

13. What do you think of design by contract?

I think it's cool. I did that on a big project a long time ago, working on C++ and we did it because it was a safety critical product and we wanted to make sure the thing was working really very well. It was the kind of thing that you couldn't stop and restart. I basically designed my contract macros in C++ and we used them in various different places and I found it was cool. The only thing that got me about design by contract - because I didn't understand it well enough back then - was I felt "They always tell you that you can take these assertions and just get rid of them in production", but then you think "That's not really all that cool, because I want to have these things caught there".

I also started to think that it's kind of a weird thing because this can't take the place of testing because you have to execute the code anyway. With unit testing it's like at least you are executing the code in various different scenarios and see what happens. When you design by contract is like putting these assertions in there and you still have to exercise the code to see what's going on with it. I took a while for me to realize that, essentially, with TDD and design by contract you are really doing the same thing.

The primary value of designing by contract is the way that it leads you to think about your code. If you are thinking about your code in terms of preconditions and postconditions, you have a chance to alter your design and make simpler preconditions and simpler postconditions. That's where the primary value is. It's great and it's something that the industry should pick up a little bit more, but it really comes down to what the thinking process gives you more than the assertions themselves.

   

14. Also, on the same line, in TDD you try to test particular cases because you can't test for all the cases and there are other frameworks of testing like QuickCheck that tries to test the semantics of the software. What do you think of such kind of thing?

I'm trying to remember the guy's name, but it's somebody who has ported QuickCheck back into - I think - Scala and Java. Tony Morris, I think. I'm unclear at names, but I think it's an interesting approach and it's something which is a little bit more amenable in a pure functional programming language like Haskell. This is the case where trying and pull this back into a mainstream language would be useful, but it may not give us as much benefits as it does in a pure context.

   

15. What do you think of null references?

It's interesting here. We are at QCon and I see there is a session by Tony Hoare basically saying that this was a billion dollar mistake to introduce null references into programming languages. It's been a bit of a battle of mine over the years. Very typically, I visit a team and I notice that every time that they send a message to an object and they get return back they're merely checking. I guess there is know or not before doing anything else. I keep saying to myself "If it just didn't return null, you didn't have to check if it didn't pass null, you didn't have to check". The sad thing is that over a period of time in a code base it's really hard to adopt a new standard of "Well, we are just never going to pass null or never going to return null and as a result we won't have to do these bits of checking and stuff."

It makes me grieve every time that I see a code base for people have to check null all over the place because of the fact that people are just not very careful about returning or passing in various areas. It is nice to see that some of the functional languages that are coming out now or have been out for a while have obviated this problem by simply not having anything like a null reference in the language. It's hard for me to figure out how serious the problem is. I know that aesthetically looks horrible, when you have null checks all over the place. It obscures intent and code. People are really running into as many null point exceptions these days as they have in the past, I'm not sure.

   

16. In the functional programming community, people claim that most of the evil and errors come from IO, which means file system, database and other such kind of things. In TDD, when we are doing unit tests, we actually don't test these, because we test them in the integration. Do you think really that the evil comes from IO?

One thing that was a bit IO opening for me was to notice after I started getting to functional a little more that most of the problems that I deal with in Legacy Code are due to things that are really fixed in the functional approach. For instance, you point out IO and things on these lines and in a unit testing situation, you simply want to move things that are dependencies on extra resources out of the computational core of the code. In some functional programming languages that's pretty much forced upon you through the use of monads and the like. That's a really important thing.

The other thing which is really striking to me with functional programming is that when I'm working with a lot of legacy imperative code or procedural code and object oriented code, the hardest thing is knowing when you make a change, this particular piece of code what else could also possibly be affected. Again, functional programming has a bit of an answer for that, which is that you try using immutable data structures and pure functions that don't really have side effects. There is value in moving towards these things because they are like a good salve for the itch, which tends to be prevalent in Legacy Code today.

On the other hand, I have a little bit of fear about it also because I realize that functional programming is a completely different mindset, particularly for people who have grown up learning object orientation, you pretty much have to unlearn several different ways of thinking about problems or to move towards this more functional approach. I don't know how many people are familiar with them. There is a great guy named Richard Gabriel, who wrote this very good book on patterns of software and one of the things that he talks about there is what it takes for a language to become more popular in the industry and become mainstream.

One of the things that seems like a rather pessimistic observation, but I think it's true to a degree, that the number of programmers who are able to or willing to think in a mathematically sophisticated way about code is relatively small, in comparison to the total population of programmers. I think that even though functional programming is becoming more popular, it is a bit of uphill battle for the industry and it may become just a very strong good niche tool for the people who are able to use it very well. I'm glad to see it's being brought up in prominence now, but I'm wondering if we'll ever see a day when everybody is doing work in functional programming. On the other hand, we got to the point where closures are becoming part of practically every programming language. It only took 30 years, so there is hope, I guess.

   

17. Do you think that, for example, multiple paradigm language could be an answer for having both paradigms in Scala and F# today?

Scala and F# are both good OOFP hybrid languages. The thing that I find as a little bit striking is that the 2 paradigms fight each other a little bit. The quintessential piece of advice in object orientation comes from the pragmattic programmer and it's "tell don't ask" so you basically send messages to objects. Objects are very much about sending messages to various other objects that send messages to others. With functional, we flip it around entirely when we say we're all about asking.

We say "Give me this, give me that" and maybe, through lazy evaluation we don't get everything that we need right away, we get things on demand and such. It seems like you've got to kind of mix these 2 things in your architecture. I don't think there is any one right answer now, but the thing that I do notice as a common pattern is it seems in many systems there is quite often an upper layer where you push and a lower level where you pull. That kind of architecture we see even when aren't doing a hybrid functional OO approach. It might be that that could be one of the patterns, so to speak, as we move deeper to multiparadigm languages.

   

18. You just talked about laziness, adding laziness to programming languages. For example, in C# they added some laziness to some constructs. Don't you think it unleashes some evil? For example because a lot of imperative stuff was happening straight, so we could reason about it, but with laziness we have no idea when it happened.

I think so, too. I'm trying to remember who said this - it was somebody in the Haskell community. The thing that they were basically saying is that even experienced Haskell programmers really don't have a good mental model of the performance of a Haskell system because of the fact that so much happens on demand through laziness. I also heard people say that that's OK because who really does in even any other language really have a good mental model of performance. It may be a non-issue in a sense. I think that this is like anything else in the programming community we are going to have to do some experimentation and unfortunately, some of that experimentation may not be successful and we are going to have to start to discover where liaison is appropriate and where it isn't.

Jul 16, 2009

BT