Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Adam Tornhill on Code as a Crime Scene, Git and Static Analysis, Clojure

Adam Tornhill on Code as a Crime Scene, Git and Static Analysis, Clojure


1. We're here at Craft Conf 2016 in Budapest. I'm sitting here with Adam Tornhill. Adam, who are you?

Thanks. Nice to be here. I'm Adam Tornhill. I'm a software developer from Sweden. I've been a developer for almost 20 years now and I still love my job. I still write code every day. My background is perhaps a little bit different because I actually have a degree in psychology, so I tend to focus a lot on that stuff and apply a psychological perspective to my daily technical work.


2. Here at Craft, you gave a rather interesting talk. So can you give us a quick overview of the talk?

Yes, sure. So my talk was called Seven Secrets of Maintainable Codebases. And that talk is based on a number of observations that I've done as I analyzed real codebases. The way I do it is that I want to do it a bit different. So instead of focusing on the micro design choices that we all make where I think that we're actually quite good as an industry, is that I want to look at the codebases as a complete ecosystem" (replace "on" with "as" where the people who work on it and the way they're organized is just as important as the technical issues themselves. So that was my basic approach.


3. What have you found out in your work?

Oh, a lot of interesting things. So one of my key points is that organizational problems often show up as technical issues and they're mistaken as technical problems. I think that's one of the things that we still don't understand well enough actually. Another thing that I've found is that all code isn't equal. There are different parts of the code that are just so much more important for our productivity and our quality. So that's what I'm working with. I'd love to try to improve the techniques that help us identify that code that actually matters for our productivity and quality.


4. So is that code like configuration code, initialization code or business logic or what code are we talking about here?

It could be any kind of code, really. And then the way I do it I use something I call a hotspot analysis which is basically a way to identify complicated code that we also have to work with often. So that means I look not only at the code but also at behavioral data of software developers, eg. where do we actually work in the code and that those hotspots that I'm able to find, they can be anywhere and quite often, I find that they tend to be in the test code.

I think there's a reason for that actually, and then I talked about it in my presentation here that I think that we developers, we tend to make a mental divide. On one hand, we have the application code and we know that's important. We need to keep it clean and maintainable and evolvable. Then we have the test code which we quite often treat like a second class citizen. I think that's really unfortunate because from a maintenance perspective, there's really no difference between the two.


5. In your analysis, do you find that the test code is more often changed or what do you see there?

Yes. I find all kinds of sins there. Usually, it tends to be more complicated in terms of pure size than the units in the application code. It's not necessarily that they have a lot of logic there but we also tend to see a lot of different couplings. So you tend to have clusters of unit tests for example that change together and what I usually find in those cases is that you have a lot of duplicated set up and teardown code for example. So that's just some of the things that we've found.

Werner: You've written a book about your findings. Maybe you can explain the basic approach that you're using in the book and in your tools.

Yes, sure. I'll be happy to do that. So my book is called “Your Code as a Crime Scene” and it's from the Pragmatic Programmers [Editor's note: ]. What I did in that book, that book is actually based on an experience I had like five or six years ago. At that time, I worked on a really, really large system. One of my job responsibilities was to prioritize the code we wanted to improve in order to become a bit more productive. I found that terribly hard to do.

At the same time, I was in the middle of my psychology studies at the university and I took a course in forensics. I previously noticed, I think forensics has this really beautiful mindset that we need to apply to software too and that in particular, there was one technique called the geographical offender profiling. Where we look for behavioral patterns in how criminals move around the town and the distribution of crime. So that's what I try to take and backport to software. So that's a basic starting point.

I spend the first part of the book talking about how we can identify technical issues using version control data which is the behavioral data of software developers. And then I spend the last part of the book talking about teamwork and how we can actually measure that and how we can measure things like coordination and communication based on how we actually work with the code.

Werner: You're using basically tools like Git against the developers in a way, or for them?

For them. I really like to view this as something that supports us and guides towards better decisions.

Werner: This is interesting because it's actually based on real data, you have actually data to back it up and say, this is problematic code or this is code you touch every day and so on, basically.

Yes, that's correct. I think that's the key point because it's really, really hard to argue with data particularly when it's your own. So one thing that I do these days is that I go out to other companies, analyze the codebase and I present my findings and some recommendations and what I tend to find that that actually made me really, really happy is that developers tend to love the work I do because they can say things to me like, “Yes. That finding that you made, I've been pointing that out for two years now and no one wants to listen to me. The manager doesn't want to give me the time to refactor that and now I have the data.” So they're usually quite happy with that even if they already knew about the stuff that I found.

Werner: Well, managers love data and metrics and so if you give them extensive metrics on your systems, they're happy.

Yes, they are. And then it also becomes quite obvious when you see the data. You will see things like for example, it's quite common that I find 3% to 4% of the code, you spend 20% to 30% of your development efforts in those parts of the code. There also tends to be a strong overlap with the defects and bugs in the codebase. So it's pretty easy to motivate that you invest a bit of time in redesigning and improving those parts. It just makes your codebase and your life a little bit better.


6. Do you only work on Git or version control systems or do you also include things like bug ticket systems? Because you mentioned the relation of bugs to source code and source changes?

Yes. So we have started more and more to integrate stuff from ticketing systems. And the way we use it is to kind of trace change patterns across the different repositories and stuff like that because a lot of codebases today are split up in multiple repositories and we want to make to pull that together and provide a holistic overview. So that's something I've been working on a lot recently.


7. So how do you work with the data? Do you work with Git mainly or where do you get the data from?

I've actually been playing around with all major version control systems and right now I run my startup, Empear, and we do tools in this area and there we focus mainly on Git. So we write a lot of queries against Git, get all the interesting information out and that's the input to our analysis. The reason I use Git is because it simply has the best data for us. It's also so widespread and the most other version control systems can be converted to Git repositories. So you can still use those techniques. It's not as limiting as it sounds.

Werner: Yes. You can just import it, as you say, yes.

It will take you a few days but once you've done it, it's there.


8. Let's talk about your company, Empear. What are your services? What are your products basically?

Yes. So Empear is something I started after I finished writing “Your Code as a Crime Scene”. I felt it was a natural next step to try to take those techniques to the next level and write professional tools around them. So we launched in September. We are four people at the moment working on Empear and what we do is we do both services and products. So for services we go out to our customers, analyze their code bases, interview some of their personnel, and provide complete analysis reports with our findings and recommendations which we then follow up.

The other thing that we do, are products. So we have two products. We have one developer edition where you can do some of those technical analyses yourself. So it's something you download and run on your desktop and then we also have a web based tool that we sell more to enterprises where you can also do all those social analyses that I showed in my presentation.


9. The developer tool, what is that like? Is that a command line tool? Does it integrate with IDEs?

No. It's a standalone GUI where everything is completely automated so you basically just install the tool and then you point it to your Git repository and that's it. You get visualizations on the data, everything.


10. What do you use to write that?

Most of our tools are written in Clojure. We do have, for our desktop application, we have a thin layer of Java code because most of the UI is done in JavaFX which I think was a terrible mistake on my side but 95% of our codebase is Clojure.


11. It's interesting. It's rare to see JavaFX used so you're saying it might've been a mistake but what was the reason to choose it or was it just whatever was available in Java?

Since we're a Clojure shop, it was quite natural to do something on the JVM. And JavaFX, I've never worked with it before but it kind of sounded right. You had the ability to style it pretty nice and all that stuff and I still think it looks pretty good but there are so many problems with it. I think it's heavy to work with and we also experienced a lot of really, really nasty bugs.


12. Bugs in JavaFX?

Yes, that's right. So that's something that actually holds us back a lot of time. So if I would do it differently now, the desktop version would be web based as well. I think that would have been a much simpler approach actually.

Werner: So just basically build a website and launch it in some Chrome or some HTML component.

Yes. That would probably be the right thing to do and I see that in our web based tool,the enterprise edition, that kind of product actually fits quite well on the web.


13. So the enterprise edition, how does that work? You put it onto a server, hook it up to Git and it just works?

Yes. That's what you do. So at the moment it's something you host yourself on your own servers. So install it and then you point it to your codebase and then you tell it how often you want to run an analysis and then the next thing that we are going to do now instead we're going to put it into the cloud and open up a service, that will be available free of charge to everyone.

Werner: That's an interesting business being free of charge.

Yes. I mean it will actually be free as as in beer but only to a certain degree. So for example an open source project will always able to run all kinds of analyses. We really want to support the open source movement. And if you're a commercial company, you will have some things that are free and other stuff that you probably have to pay for.


14. Before you talked about that you analyze the Git history basically. Can you talk some more about what kind of complexity metrics you figure out?

Yes, sure. So that's one area where I've been playing around a lot a number of years ago. I try to base all the analyses that we do on research as far as possible and there's a lot of really good research in that area and one of the metrics that I actually use is the number of lines of code. Just strip away the comments and blanks and I used that to get kind of an estimate of how complicated is this piece. And the reason for that is that there's some pretty convincing research that shows more elaborate metrics like McCabe's cyclomatic complexity or others, as soon as you start to control for number of lines of code, they won't add any further predictive value.

So that was a really interesting approach and the number of lines of code has a huge advantage because it's language neutral. So that's one of the approaches that we use.


15. When you say lines of code, is it per unit, like per class or per module?

At the moment, we only do analysis on a file level so it's like per class. But I would love to dig deeper but that would mean that we have to be a bit more language specific. But I think it would be a really interesting approach at the moment we can find hot spots and all kinds of complexities on a file level. It would be really, really interesting to be able to do that on another level over a number of different languages.

Werner: That's a lot of work.

Yes, it is. It is. I hope that I get around to do it soon because I'm curious what I will find.

Werner: You would start off with, I guess, Java because it's the most used language.

Yes, Java or C# actually. In Malmo in Sweden where I'm based, there are a lot of companies that base their business on C#. So those two languages would be my starting point, I guess.

Werner: Interesting. It's rare to see C#, I guess.

Yes, not where I come from.

Werner: Interesting.

So before I started Empear, I used to work as a software consultant and most of my assignments were .NET actually, C#. Something with the Swedes probably.


16. Probably, yes. Talking about language, so you built your tools in Clojure. Can you tell us why?

Yes, sure. So I often get that question actually. I could say things like yes, Clojure is a great tool for data analysis which it really, really is. But that's not the reason I picked it at all. The reason I picked Clojure was because it looked fun and I wanted to learn the language and I think that fun is a much underestimated driver and motivator. Fun is a guarantee that things get done. So that's the reason we chose Clojure actually.


17. Is it because of its functional nature or because you like Lisp?

It's mostly because I'm a big fan of Lisp. I started programming in Common Lisp ten years ago, I still love Common Lisp, but I think that Clojure is a bit more practical for the kind of work that we do. That's why I picked that direction.


18. Do you still follow Common Lisp?

Not so much. I look a little bit at some discussion groups or things like that every now and then but not as much as I would like.

Werner: I was wondering if Common Lisp was in a state or Common Lisp implementations are in a state where you can basically do what you're doing? So ship client side, server side solutions.

You could. You definitely could. I mean there are some really, really great libraries in Common Lisp, so you can definitely do that. I think that the main problem I found is that the community is a little bit too small actually. So you have a lot of good libraries but of course, you cannot compete with the Java ecosystem.


19. Is there one Common Lisp or are there different implementations?

There are lots of different implementations. So Common Lisp is actually a standardized language and then you have a bunch of different implementations. There are several really high quality implementations.


20. Are they commercial or open source?

They're both commercial and open source. So if you're interested in Common Lisp, I would recommend the an implementation called SBCL which is open source and it's really, really good and very performant.


21. Okay. So your tools are basically written by you and your team I suppose.

Yes, that's right. We started out and I worked -- so we are four people. So it's me, I represent half of the development department. And then I have our colleague that helps out with the code and he joined a couple of months ago. So we have written all of the code.


22. Have you run your tools on your own code base, any bodies hidden in your codebase?

Yes. So that's something we've been doing all the time. That's the first codebase I analyze with the tool. It had to run on itself. It just felt fair. So yes, we actually find stuff there as well. The good thing with doing that, with eating your own dogfood, is that you can have a sense what you should expect and you can pretty much evaluate if it makes sense. The most surprising findings that we do, that tend to be not so much around hot spots but because we're two people, we know what those hotspots are.

But they tend to be around a concept called temporal coupling which is different files that tend to change together in time. And there we can sometimes detect something that looks like a little bit like a design or architectural decay. So we see that and say oh, we really need to stop and do things a bit different there that usually gives us pretty good design insights.

Werner: I gather that you're still happy with Clojure, you're not sort of looking at C# or other things.

No. We're not looking at C# at all. But yes. I'm happy with Clojure in the sense that it helped out a lot when I started. It allowed us to implement features really, really fast and I mean we're two people implementing that stuff so if we would use a mainstream language, we could never ever compete with anyone else. So we used that tool as an advantage. Because you're really that much more productive in Clojure than I would be in, let's say, Java.

At the same time, I found out that now that our codebase is growing in scale, I find it really, really challenging without a real static type system. That's actually one of the big drawbacks I see that I think it's much harder to scale a Clojure codebase in terms of your own cognitive capacities. So hard to wrap your head around it. And you need some kind of safety net and I would love to have a real type system on top of it actually.


23. Clojure has some sort of optional type annotations. Are they not sufficient or what do you think?

Yes. So that's true. You have clojure.typed and I heard a lot of good things about it initially and I still think it's a brilliant approach because it basically gives you the best of two worlds. You can start out and use dynamic typing to prototype and explore and then once your solution starts to stabillize, you just add an optional type system as a library. So it's a beautiful design.

So I never tried it on a larger scale and there has been a lot of blogpost written or some blogposts written about it where they point out a number of problems with that approach. So we will have to think a little bit about that. We will continue to use Clojure but perhaps, we will try to change the architecture a little bit in order to allow us to wrap our head around what we actually create.


24. Can you give us an idea where the lack of a type system seems to hurt you the most? What kind of areas do you think are the problematic areas?

Yes. So the most problematic area for me is refactoring I find it really, really hard and you need to rely a lot on your tests and if you don't have enough tests, there are some parts where I feel scared to make changes, the really central parts. So I have to be really careful and it's a big contrast to my colleague Oskar Wickström, he implements his own programming language called the Oden language and he writes that in Haskell which is the opposite of Clojure. He has shown me the kind of changes he’s done. He is able to do large sweeping changes and still be pretty sure that everything will work as soon as it compiles and I really like that but I understand that it's a tradeoff.


25. So the tradeoff you mean, is it the complexity of the language? What would be the tradeoff?

Yes. So I definitely think that's one point but at the moment, I don't know Haskell well enough to really comment on that but it's on my to-learn list, so I hope to start with that next week.


26. Do you think it's easier to prototype ideas in Clojure without a static type system?

I personally think it is but I know that for example, my colleague would disagree with me. He's so experienced with Haskell so he's really actually able to think in terms of types. So it's a different approach to problem solving. But I actually think that the way we humans do problem solving, we learn by doing. And you need ability to change your decisions fast and incorporate the feedback you get from observing the effects your solutions have.

So the ability to be able to change and move rapidly, that's the most important factor behind that language.

Werner:I think that's a good point to end on. Where can we find you? So your book is called --

Your Code as a Crime Scene.

Werner: Okay. It's very memorable and your company's name is Empear; explain the name of your company.

Yes. So our company's called Empear. And the Empear is actually the transcription of a Coptic word and Coptic is a dead language. It was spoken in Ancient Egypt. And the reason I picked that was that it's so hard to find a unique company name, and a dead language that sounded like a safe bet. So you will find it at the and you will find me on Twitter as @adamtornhill or have a look at my blog, where I write about this stuff.

Werner: So audience, you have homework. Thank you, Adam.

Jul 19, 2016