Implementing Automated Governance for Coding Standards
Most development organizations of a significant size have some form of coding standards and best practices. For many organizations, simply documenting these standards and keeping them up to date can be a significant challenge. Beyond that, the consistent enforcement of such standards and best practices can be even more difficult. Our organization has found that enforcing coding standards and best practices in an automated fashion through our build process has been highly effective.
The proactive nature of our solution is the most important aspect. Even in mature organizations where code reviews are performed and direct feedback is given to individual employees about bad coding practices, if this process happens retroactively then the stakes are raised; the mistake has already been made and the developer is on the defense. Even worse, if the review does not happen during the development process, then the bad code has already reached production and the damage has been done. Because our build process is centrally controlled and a compliance check is executed automatically during the build of any software asset, harmful code never gets promoted in the first place, reducing the need for costly cleanup projects and uncomfortable employee performance discussions that result from more retroactive audit strategies. Instead, developers are given immediate feedback (an HTML report in our case) by an emotionless system that doesn't care if they made a mistake. So the developers still have an opportunity to learn from their mistakes, and the system will continue to proactively ensure that the organization is protected from dangerous code even if it takes the developer a couple of build attempts to remember a new coding standard.
A centralized build process
In order for the strategy we will discuss here to be effective, two things need to happen:
- There needs to be a server based, centralized build process. Ours happens to be a build system based on Ant scripts that we developed in house because we built it before products like AntHill and Maven had matured. If you were starting today I would recommend you select a third party build management system rather than build your own. The fact that we built our own did make some of the process customizations I will talk about more straightforward, but you should be able to integrate the same functionality into many third party build systems.
- You need to make sure that going through the build system is the only way that development teams can promote code into the test and production server environments. I don't like to be dogmatic about things, but this part is not optional. If developers can just FTP java class files directly into one of the environments and bypass the build process then the effectiveness of the solution we are discussing here will be dramatically reduced. We protected our environment from this scenario by simply locking down write access to the relevant directories on our servers and only giving write authority to an account that runs the JVM process that hosts our build system, thereby making our build process the only mechanism developers have to get code into our production and test environments. Because our build system pulls the selected project directly from our source control repository every time it executes a build, this lockdown accomplishes two things for us: it ensures that all code in test or production is also in source control, and it also ensures that all code in those environments has gone through an automated software audit.
Tooling the automated software audit
We happen to use a product called Parasoft Jtest for our automated code audits, but there are other products that can accomplish what we will talk about here. Jtest has some pros and some cons. Overall it has been an effective tool for us, but we had to hack an infrastructure around it to get it to work the way we needed it to; it was definitely not an out-of-the-box solution for the strategy presented here. Jtest has two main features: static analysis and dynamic analysis. The dynamic analysis features of Jtest are useful, but we won't talk about them here because it is out of the scope of this strategy.
We purchased Jtest about 4 years ago when our organization was having problems with unclosed database connections in production due to resources not being cleaned up property in a try/catch/finally block. Sound familiar to anyone? This was before Rod Johnson descended from the heavens and delivered the JdbcTemplate, and many organizations were struggling with this issue. This kind of coding issue is exactly the kind of thing that Jtest is great at preventing. It analyzes the structure and content of a Java class and applies rules to it. A rule in this context would be something like: if a database connection is created or obtained from the connection pool within a method body, make sure that there is a try/catch/finally block and that the connection is closed or returned to the pool in the finally block. To make a long story short, 4 years ago we created a Jtest rule that did exactly that, made it a "Severity 1" error, and (this part is important) changed our build system to automatically halt any builds that had Severity 1 Jtest errors. The system worked great, and the database connection issues went away.
Now that we have Rod and the JdbcTemplate, this particular rule is less relevant, but is still useful for our legacy Java apps that haven't converted to Spring. And there are now many more rules that we check for which are still relevant. We have found it to be a great tool for enforcing architectural standards. For example, when our organization implemented a logging standard, we turned on a rule that made it impossible to promote System.out.println statements, which were no longer permitted. And these examples just scratch the surface. There are a few hundred rules that come out of the box with Jtest, and you can create your own as you need to.
Some caveats about Jtest: as I stated earlier, Jtest as a server wasn't good to go when we got it. Parasoft's main product line is an Eclipse plug-in which does the static and dynamic analysis from within a developer's IDE. That is not what I am talking about. I am talking about a server based Jtest product that is integrated into our server infrastructure via command line calls from our build server. Parasoft feels that the kind of definitive organizational control and governance we are discussing here can be achieved by buying the IDE plug-in for all of your developers and hooking them up to Parasoft's centralized reporting server, but we have not found that to be the case. The problem is that Parasoft can't guarantee that a developer ran a static analysis before checking source code into CVS. Because they have no control over the eclipse CVS plug-in (or Subversion or whatever), there is no control point where Jtest can say "Stop! You can't do that if you have severity 1 errors!" Because of this, the test has to be run not on the desktop but instead at a central control point, and for us, that is our central build system. So we needed a server version of Jtest that could be called from the build system during every build, and we had to do that integration work ourselves (although it wasn't terribly difficult).
I also want to re-iterate that Jtest isn't the only game in town. Adrian Colyer and others have talked about using AspectJ aspects to enforce coding standards. That could be very easily implemented on a centralized build server. I am not sure if you could do everything with aspects that you can do with Jtest, but it's free. Other competitive products and eclipse plug-ins perform a varying subset of the static analysis functionality found in Jtest. And if you want to start out really light, eclipse has support for stylistic and syntactical coding standards within the standard JDT.
Best Practices for Governance Rollout
Your strategy for rolling out automated software governance is far more important than the technologies you choose to build your solution. Here are some of the lessons that we have learned after doing this for a few years:
- Keep the governance structure simple. We only have 3 categories of rules: Severity 1, 2, and 3. Severity 1 rules will stop your build, and your project will not be able to get into our test and production environments until that issue is fixed. Severity 2 is basically a staging area. It tells the developer that this rule will be a severity 1 within the next 6-12 months, so they should probably fix it now before they find themselves under a deadline and unable to build their code. Severity 3 doesn't have teeth. It's something we recommend that you fix, but until we promote it to a 2 it doesn't have the potential to actually stop a developer from being productive.
- Be conservative. As I stated earlier, Jtest comes with hundreds of rules out of the box. When we first deployed Jtest we had only 2 severity 1 rules turned on. The reason for this was simple: we wanted to avoid establishing a precedent for bypassing the control point because a project manager is screaming about a rushed deadline. It is better to be conservative and have the process be authoritative than to be aggressive and have the exceptions pile up.
- Do proactive impact analysis. When you are about to deploy new Severity 1 rules, you should have a pretty good idea of the frequency with which they occur in your projects and the time and cost of remediating that code. This is not hard – you just need to run a static analysis over those projects with the new rule activated and take a look at what the report looks like. This will save you from deploying a new Severity 1 rule that can't be sustained by the organization, forcing you to demote it back to a Severity 2. If the impact analysis is too high, keep it a Severity 2 for another development cycle. If you don't see a reduction in number of the occurrences, work on your message to the organization regarding the importance of addressing Severity 2 issues. There will be times when a critical issue forces you to implement a Severity 1 rule that has a broad impact, but when you do so it is absolutes critical that management understands the impact and supports the decision.
- Communicate well. Talk to your community about the new rules, the value behind them, and the reasoning that went into implementing the new rules. Most of the time the development community with agree with you, but they don't like to be surprised.
The details of our implementation notwithstanding, proactive and automated software audits have been a great benefit for us. The quality of our production software assets has increased, but perhaps more importantly we have accomplished this using a reliable system that we could count on organizationally without focusing a lot of energy on maintaining it. Maintaining human based organizational processes to support standards require focus and energy from organizational leadership. By designing your development support infrastructure appropriately you actually get more organizational security with less effort expenditure.
About the AuthorMark Figley leads the architecture group at AIG United Guaranty, the Mortgage Insurance arm of AIG, the world's largest insurance company with $800B in assets.
Who is not doing it?
Re: Who is not doing it?
Re: Who is not doing it?
I completely agree that the strategies of utilizing source control and a centralized build process are in fact ages old. I am not sure I agree with you on how broadly those best practices are implemented, but they are definitely not new ideas. The only reason that I talk about locked down environments and centralized build concepts in the article is because they are required precursors for the automated software audit strategy (which is the focus of the article) to work, and in my experience you cannot take for granted that those best practices are already in place.
So I understand your thoughts on the relevancy of FTP check-ins, but I wonder if all of those same organizations you referenced have implemented their organization's coding standards as an automated software audit infrastructure that scans every class during every build for code that does not follow the standards? And I am not talking about executing a batch of unit tests as part of a build. I am talking about pattern rules broadly applied over the entire codebase. In my experience organizations having that infrastructure in place is more rare, and that is the value that I was hoping to provide with this article. Does that make any sense?
Re: Who is not doing it?
Well, the organizations that I am talking about have checkstyle and PMD mapped to their build process, hence they do check it with every build. Finally a report is generated on the Maven dashboard with the number of violations for the team to work on.