Most developers nowadays are familiar with the basic tenets of Continuous Integration, but arguably only a small proportion of these are fully benefiting from an optimized CI set up. Indeed, an effective Continuous Integration environment can save your team time, money and even existential angst. It can enable bugs to be discovered earlier, their cause identified more easily, and ultimately get them resolved more efficiently. It can encourage better source code management practices, help you leverage automated analysis tools, encourage better testing, track your progress and remove bottlenecks from your developers lives. It can facilitate the deployment process, and make your releases go more smoothly and more reliably. Managers will have more charts than they know what to do with and developers will be happier. To put it another way, not using Continuous Integration is like developing software using Notepad, it's possible but it's horribly inefficient.
In this article, by John Smart, principal consultant at Wakaleo Consulting, we will investigate a few Continuous Integration practices that show how you can take CI beyond merely being a glorified cron job and a scheduled compiler, and make it an effective, productivity-enhancing hub for your whole development activity.
Continuous Communication Flow
One of the principle characteristics of a well-tuned Agile development environment is how it tries to maximize the flow of information between team members. Each developer needs to know as soon as possible when a build fails, or a change is made that may adversely affect the quality of the application. When a build does fail, it is important to know what changes were made and what issues the changes where trying to address: all of this information should be available to any developer with only a few keystrokes, in the most intuitive way possible.
Even the most basic CI setup will send emails to developers when a build fails. However, with a little effort, you can do much better than this. A well-implemented Continuous Integration server should act as one of the hubs of team communication. Beyond simply sending out email notifications, for example, modern CI tools have many features that can open the way to smooth and efficient communication between developers about code changes and build failures.
Email is probably the oldest and most frequently-used CI notification strategy, simply because it is ubiquitous and fairly easy to set up. However, email is in fact a relatively inefficient notification mechanism. When a build fails, you want to be told fast. The faster the better. If you have to wait 15 minutes for your email client to update, that's 15 minutes of potentially wasted developer time. In addition, email is often seen as a distraction to developers, and for good reason. In an enterprise context, the number of less-than-urgent or irrelevant emails per day can be phenomenal. Indeed many good developers actually disable email notification and only consult their emails periodically, a few times a day.
Instant Messaging is in many ways a much more appropriate notification medium, for several reasons. Most importantly, Instant Messaging is, well, (almost) instant. You don't need to wait ten minutes until your email client checks the server for new mail - you know that you will be notified immediately. It is also easier to canalize than email, and is faster to read. IM messages coming from the build server can be treated with the attention they deserve, and not drowned in a sea of useless other messages.
Don't underestimate the importance of a few minutes between a quick IM message and a sluggish email that arrives 10 or 15 minutes after the event. Those few minutes are largely enough for a developer to lose focus and move onto something else, making it that much harder to get back into the context and fix the problem.
Another advantage of Instant Messaging is that it can be extended beyond the desktop. Applications like BeeJive take Instant Messaging to mobile devices such as Blackberries and iPhones. This way, critical build failure notifications can be sent to developers even when they are not in the office.
IM notification also allows more interaction with the CI server than does email. As a communication medium, Instant Messaging is more spontaneous than email, and allows more interaction than Subversion commit messages. Some of today's CI tools will not only send out notifications by IM, but will also make it easy for developers to interact with the build server, and with other team members, using Instant Messaging.
Of course, Instant Messaging is not the only alternative notification available. If you want people to take notice of your build messages, it is very important to ensure that the notification mechanism blends smoothly into the enterprise culture. For example, some companies use social networking tools such as Twitter as an effective internal communication channel. For these organizations, using Twitter can also be an efficient build notification strategy.
Keeping the build process effective
One of the primary goals of a Continuous Integration build environment is to keep the development process rolling along smoothly, and to avoid hitches, road-blocks and development delays caused by integration issues. When an integration issue does crop up, the onus is on the developer who committed the code to fix the problem fast, before it has time to affect other developers. Without a Continuous Integration environment, it generally falls upon the developer blocked by these integration issues to find a solution ("hey, it works on my machine!").
To keep the development process at its best, team members (and in particular team leads, process experts and the like) need to be able to monitor the build process, so that they can identify and address problems that could slow down the developers in their daily work. The best way to do this is to know how to make good use of build telemetry.
Build telemetry is a term used in Continuous Integration circles to describe the statistical data collected over time about your builds. Bamboo is a good example of one CI tool that provides very advanced build telemetric features. Build telemetry provide you with information about how long your builds are taking to run, how successful they are, how long build failures take to fix, and so on. This data is important as it tells you how the build has been behaving over time. It is this sort of data, rather than individual build results, that can help you keep your build process finely tuned.
The number and frequency of build failures is always a good place to start. Isolated build failures are usually nothing to worry about - it is the series of repeated build failures that you may need to investigate. When a build fails repeatedly, a developer may have been struggling with a particularly hard piece of code, or the team may simply have been ignoring the build failures. Both issues would obviously deserve further investigation, though for different reasons and using different resolution strategies in each case.
You can learn more about why a build has been failing by drilling down into the test results. Many modern CI tools let you study test behavior over time, for example, to isolate tests that have been failing frequently, or that have been taking a long time to fix. If the same tests are failing repeatedly, it may be an indicator of overly complex or fragile code, which could do with some refactoring. It also lets you study how long the tests have been taking to run, which can be another source of problems.
Indeed, build failures are not the only thing that can slow down your development process. Slow builds are another, more insidious culprit.
One of the most common causes of slow builds is poorly-structured test suites. A common best practice of experienced Java developers is to separate unit tests from integration tests. The exact distinction between the two may vary, but in general, unit tests are meant to be small, fast, light-weight tests that test classes in relative isolation. They ensure that classes do what they are intended to do, in isolation. Integration tests, on the other hand, are slower and longer running, and may access external resources such as fully-populated test databases or load complex configuration files. They test how the different modules and classes in the application work together. Performance tests fall into a similar category, though their goals are a little different.
The lightweight, fast-running unit tests can be executed very quickly, and give rapid feedback if there are test failures at this level. If, on the other hand, slower-running integration tests are mixed in with the unit tests, the unit tests will take much longer to execute, and developers will have to hear about unit test failures. The way to avoid this is to create separate build plans for unit tests and integration/performance tests. This way, if the unit test build plan fails, it will do so quickly, and developers will not have to wait too long to be notified of the build failure. If the unit tests succeed, only then will the integration and performance build plans be kicked off.
Another, complementary approach is to distribute your builds. For example, if your functional web tests take a long time to run because you have to run them on several different browsers, set up a build job for the tests on each browser and run them in parallel, possibly on different machines.
Another problem can come from overly slow and inefficient test cases. There are many ways to keep tabs on suspiciously slow tests. Sudden increases in the time your tests take may mean that some of the tests are taking too long to run. This might be because they are poorly designed, or it might be a performance issue that should be investigated further. Or it might be a sign of an integration test masquerading as a unit test.
Keeping tabs on code quality
A Continuous Integration server should be more than just an automated build machine. It should be a communications hub for your team. One area where this is particularly relevant is code quality. Keeping an eye on coding standards, and metrics such as code coverage and code complexity, can help make your application more reliable and easier to maintain.
There are many good tools that can help you maintain a high standard of code in your application. Static analysis tools such as Checkstyle, PMD and FindBugs analyse your code in the search of coding standards or best practices violations and potential bugs. The tools you use, and how you configure them, depend very much on what you are trying to achieve. For example, Checksytle concentrates more on coding standards and best practices, whereas Findbugs is more concerned with looking for incorrect, broken or dangerous code. All of these tools integrate easily into an automated build process, and work well with both Ant and Maven.
Test coverage is another important area of code quality. Test coverage metrics measure the number of lines executed by your tests. There is some debate among Java developers about the relative value of test coverage statistics. Indeed, while test coverage can tell you what lines of your application were executed, it has no way of knowing if those tests were thorough, well written tests or simply superficial ones. In short, test coverage does not guarantee that your tests are of high quality - only human code reviews can really give any assurance of that. Nevertheless, test coverage metrics are an excellent indicator of what code has not been tested. If your code is never executed by your tests, you can be assured that it has not been tested. The most widely used code coverage tools in the Java world are Clover, a very powerful commercial code coverage tool, and Cobertura, a more light-weight open source tool. Both can be easily integrated into both Ant and Maven-based build scripts.
Coding standards can also be used very effectively as a support for training and mentoring activities, especially with inexperienced developers. CI tools can provide a high-level picture of how these metrics evolve over time, keeping tabs on how well developers are applying the techniques that they are being taught. For example, low or dropping code coverage on a class may indicate that one of the new developers is having trouble assimilating the test-driven development and testing practices that her team is trying to teach. This approach can be complemented by code reviews and regular code quality meeting, where any new issues or trends are discussed.
Once the build is over - automating the deployment process
Building your application is just one part of the development life cycle. Once the code is compiled and tested, other activities come into play, such as deployment to a staging environment, smoke, functional and performance tests, preparing release notes, and notifying QA staff of the latest release.
Automatically deploying your latest build onto an integration server is a relatively simple affair. Deploying to a staging or production environment, however involves a very different set of tasks to the ones involved in a conventional build job. You generally need a more rigorous, more formal process, with a lot more traceability and accountability. It typically involves tasks such as:
- Tagging the source code to be used for the staging release
- Compiling and testing the application
- Publishing the build artifacts
- Deploying the application to the staging environment
- Running database update scripts or other environment-specific scripts
- Running smoke, functional, and performance tests
- Preparing and publishing release notes
- Notifying stakeholders about the latest staging release
This is most often a manual task, but there is no reason why much of it cannot be automated. Indeed, automating the packaging, deployment and release phases of the development life cycle makes solid business sense. For one thing, automation results in more reliable builds: a computer never forgets a step in the deployment process, or ploughs ahead even if there test failures have been raised. It can also save developer time: a staging release is reduced to the click of a button rather than hours of shell scripting. Finally, it is faster, and can be done without a human watching (e.g. overnight or during the lunch break).
Tools like Maven 2 can help automate some of these steps. The Maven Release plugin makes it easy for Maven users to automate the process of updating version numbers, creating new tags in Subversion, and publishing the build artifacts to a Maven repository. This can be used to manage build promotions, and decide what releases to deploy to the different environments. However, once the production-ready build is completed and made available for the staging or production deployment, the process becomes more complicated.
Indeed, real-world deployments often require more than a simple deployment of a WAR file. Depending on your application architecture and on your production platform, you may also need to run SQL update scripts against the staging or production database, deploy web-services using a proprietary tool, run automated smoke tests, or do any number of other server-side tasks.
CI can help even with these more complicated stages. With distributed builds, for example, you can set up a build agent on staging and production machines, to run the appropriate tasks directly on the machine. And most modern CI tools support a fairly fine-tuned security model, so that access to staging and production environments can be limited to a select few, and keeping track of who ran what build when.
This is a relatively new application of CI, and different tools have different approaches to the problem of application deployment. Some, such as Hudson, allow you to define multiple steps in a build job, only executing subsequent steps if the previous ones succeed. Others, such as Cruise and Anthill Pro, try to integrate the broader deployment life cycle concepts such as staging and production environments directly into the build tool, although sometimes at the cost of additional complexity.
There are also more low-level options, which can be used in conjunction with your CI server. One option is to use build tools such as Ant or Maven. Ant is generally more flexible for ad-hoc scripting of this sort. Another popular option is the old Makefile, or a Unix shell script. These have the disadvantage of being OS-specific, and can be hard to maintain for Java developers unfamiliar with the subtleties of shell scripting. A more Java-friendly alternative can be found in dynamic languages such as Groovy or Gant (a tool for Ant scripting using Groovy instead of XML). Groovy provides all the advantages of a light-weight, dynamic scripting language while remaining relatively familiar and readable for Java developers.
Conclusion
These are just a few ways that a modern Continuous Integration environment can help you hone your development process and empower your team. Much more than just a build scheduler, a Continuous Integration environment can be used to open up communication channels within your team, to help you keep your build process running smoothly and efficiently, to help you monitor code quality, and to automate the release and deployment process.