Going Live: The Role of Development and Operations in Preparing and Deploying Software Packages
Properly deploying your software application - its binaries and associated configuration - to your Development, Test, Acceptance or Production (DTAP) environments according to your release process is a complex task spanning multiple departments and teams. Unfortunately, in many organisations this critical process is still rather time-consuming and error-prone.
Here, we will discuss how Development, Operations and others can collaborate to prepare a good deployment package. A good package reduces the potential for error and improves clarity whilst still allowing for environment-specific customization.
Furthermore, we will examine how a well-structured deployment package makes it easier to successfully transition to deployment automation, increasing throughput and reliability and at the same time minimizes errors and waiting time throughout the development and maintenance lifecycles of the application.
Separation of Concerns in the Deployment Process: What vs. How
Obviously, your deployment package needs to contain all the components that belong to your application. Not just your binaries - EARs, WARs etc. - as they would typically be produced by your integration builds, however, but also static content, properties files, shared libraries, proxy configurations etc. And especially also configurations and settings for the application server, such as queues, connection factories, data sources or classpath entries. In fact, your package should contain all items that share the lifecycle of the application, i.e. it will be deployed, upgraded and undeployed together.
Ensuring that your package really represents a "fully deployable unit" is critical to achieving reliable deployments, especially in large-scale environments. Hoping that the target middleware stack has been correctly set up with all the "right stuff" is a common cause of failed deployments, and frequently causes time-consuming and frustrating hunts for seemingly undetectable differences between environments.
It is not at all uncommon to see deployment packages that contain only configurations, for instance shared "services" such as queues or data sources used by multiple applications. These can and indeed should also be treated as versionable, deployable objects that are released using the same rigorous process as for "normal" applications. In fact, in the case of shared services it is often more critical that deployments run smoothly and reliably.
Apart from ensuring that your packages represent the most complete description possible of what makes up the specific version of your application or configuration, packages should also indicate what other things they require in order to run. In other words, deployment packages need to unambiguously define their prerequisites and dependencies so that deployments can only be carried out to target environments that match.
Precisely which parties contribute items to a deployment package depends strongly on the organisational set-up. Whilst it is frequently the responsibility of the development team, which often is able to automate at least part of this process using continuous build tools, your release process may also involve other teams. For instance, it is not uncommon to see SQL scripts being signed off by the database administrators before being approved for release, or middleware configuration settings being checked by the middleware Center of Competence.
No Delta Releases Please
Often, one of the goals of the deployment set-up is to minimize the impact on the target environment. Obviously, if you're running in an "always-on" environment or have multiple applications sharing a cluster, unnecessary server restarts are something to be avoided as much as possible.
One common approach to achieve this is to require development to deliver "delta" release packages that contain only the new or modified components of the system, and to only deploy these. Experience shows, however, that this is a risky strategy and should be avoided for a number of reasons:
- Preparing a delta release is almost always a manual task that relies on the developer's knowledge of what has or has not changed. This is a time-consuming, error-prone process that is seldom reproducible.
- The deployment package for an application version is no longer sufficient to actually deploy that version of the application - in the worst case; all previous versions may be required. Deployments to "fresh" environments (imagine quickly firing up a couple of clean images in order to reproduce a pressing issue) become much more complicated, and the risk of failure increases as all the previous package versions must still be present and in good order. This is, of course, simply the incremental database backups problem.
- Unless every version of the application is deployed on every environment in the same order, there can be a proliferation of delta packages that upgrade to the same version, but from a different base version. This quickly leads to confusion and errors when the wrong delta package is chosen.
- A release repository that does not contain complete packages, but only fragments, is not a very good candidate for a Definitive Software Library.
None of these points invalidate the aim of having minimal impact deployments, of course. But the deployment package is not the right place to attempt to implement them. Instead, the deployment process should address this requirement by deducing at deployment time which components have been added, modified and removed, and should act accordingly.
Carrying this out manually in the time-constrained context of a deployment is very difficult (with delta packages, developers are effectively asked to do this in advanced), but with a good deployment automation system this is easy to achieve. In fact, this can be one of the main benefits of introducing deployment automation.
In such scenarios, there is usually a dedicated Release Management organisation responsible for coordinating the various deliverables, and it is important that the deployment package is structured in such a way as to allow the appropriate people to update, modify and approve its contents.
Where constructing a deployment package is a multi-step workflow that may involve multiple parties and take a significant amount of time, there will also be "incomplete" packages still in preparation. In such cases, it is important that the release management systems do not allow such "draft" packages to be exported.
As an additional safeguard, it is recommended that approved, released packages are digitally signed by the appropriate parties, and that all packages are verified before deployment.
Down to Bits and Bytes
Choosing a format for your deployment packages is mainly a matter of convenience. Of course, packages should be:
- easy and convenient to move around - preferably a single file
- easy to inspect
- compressible, since text files such as SQL or properties are highly redundant
- support some form of error correction
- portable across platforms (e.g. developers working on Windows vs. UNIX machines used in production)
- digitally sealable and/or signable, in order to be able to verify with certainty that the package being deployed is the approved one
So generally, an archive file format such as ZIP, TGZ or RPM is chosen, but which one you pick is entirely up to you.
There isn't much to say about the formats of the build artefacts since there isn't any choice - a WAR is a WAR is a WAR. In the case of configurations, however, things are different. Currently, if queues and data source definitions are even described in the package itself (since it's not uncommon to come across them only in an email to the one of the Ops team) they are generally described only in human-readable documents such as release notes or Word templates that are ambiguous and not easily checked for correctness. This may be convenient for purely manual deployment processes, but means that configurations cannot be automatically detected, validated and, with deployment automation, executed in an easy and reliable way.
For these reasons, configurations should also be defined in a machine-readable format such as XML.
Unfortunately, there aren't as yet any standards in this area to recommend. In fact, this is an initiative on which we really want to collaborate with users and vendors in this space.
It is preferable, though, to avoid formats that are derived from a specific middleware stack, e.g. sections of WebSphere's config XML. Whilst it might appear convenient to be able to prepare configuration definitions simply by extracting them from an existing file, it introduces a type of vendor lock-in that can make it harder to deploy applications to different server stacks. Often, this is entirely avoidable because the applications aren't using any features specific to that middleware stack, but could just as easily define "generic" JMS queues, datasources etc.
Finally, a manifest or BOM should describe the contents of the package. This is the place to list package dependencies, a description for operators and any other information that is not easily linked to items in the package, such as the responsible developers' contact details, the project name etc. The manifest also needs to specify the content type of package items where this is not determinable from the file name or contents: an EAR file is obviously an EAR but a ZIP file may contain static content for an HTTP server, application configuration files etc.
Again, the format should be machine-readable and -verifiable and easy to produce automatically (e.g. from a build system).
We now have a convenient, portable, automatable deployment package that is the most complete description possible of what should be deployed for this application version. So far, however, we have said nothing about how to deploy it. Where are the release notes? The deployment and rollback plans?
Ideally, there shouldn't be any. More precisely, your standard Java EE deployment process should, on the basis of the contents of the package, be sufficient to carry out a correct deployment. Application-specific deployment steps mean that something special needs to be done for this application that usually only the developers understand well. It is almost inevitable that, when an emergency deployment needs to be carried out at 2am and the developer can't be reached, the Operations staff will make mistakes. In short, the developers should know what needs to be deployed, they should not bother about or be able to specify how.
Of course, all this presupposes the existence of a standard Java EE deployment process within your organisation, or perhaps a small number if you have radically different families of applications. This process will need to be complex enough to cover the components and configurations used across your development landscape, but it is also important to limit the spectrum of allowed options in order to preserve a maintainable, testable and reliable process.
This will necessarily impose some constraints on development technologies and configurations of them. However, given the fact that maintenance cost is by far the largest overhead of running an application, the benefit of a less error-prone, more reliable deployment process outweighs that of maximum development flexibility. Of course, this is a trade-off that each organisation will need to consider in the context of its own requirements.
A standardized deployment process is not only easier to maintain. It is also easier for Operations staff to reliably execute under stress than using a bunch of release notes that differ from application to application and version to version. It is also simpler to automate, freeing up Operations for less repetitive tasks and, in combination with suitably access controls, enabling developers to carry out their own deployments. With suitable integration points, it is even possible to connect to build and release systems and realize continuous deployment for true end-to-end automation.
Moreover, it is easier to reliably extend automated deployment processes to support more complex scenarios, giving development teams more flexibility whilst avoiding ad-hoc processes. Since automated processes also become harder to implement and test as they grow more complex, this is a decision that needs to be taken with care. But the significant advantage over manual processes is that, once tested and verified, you can be sure that steps of the process will predictably be executed in the same fashion. This is hard for human beings to achieve continuously when carrying out complex, multi-step processes.
From Deployment Package to Running Application: Customizing Packages to Target Environments
So far, we have talked about putting together the one definitive package that fully describes your application version, in accordance with good release management practices. In doing so, we've tacitly assumed that this one package can be deployed to different environments "as is". In today's Development, Test, Acceptance and Production (DTAP) landscapes, however, it is almost always necessary to "tweak" the application and associated configuration and resources to match the target environment - think endpoints, properties files or datasource usernames and passwords.
As some enterprises move further down the road of virtualization and virtual appliances this will change, but currently the deployment packaging and process should cater for simple, reliable and transparent customization of the application components and configuration at deployment-time.
In the absence of any established standards or even guidelines in this area, many different solutions to this problem have been employed, from fairly elegant approaches such as JMX to crude string search-and-replace.
Furthermore, different types of middleware platforms have varying degrees of support for customizations: typically, portals, ESBs and process servers offer some "native" solution to the problem, whereas application servers tend to leave users to fend for themselves.
More often than not, the result is a chaotic mix of customization approaches across projects, target platforms and departments. As ever, this simply increases the maintenance overhead and potential for confusion and errors.
So what should a "standard" solution to customization offer? When comparing different approaches, there are a number of things to bear in mind:
- Convenience: The customization procedure should be easy to set up and quick to carry out. This is mainly a concern for developers that might have to execute a customization whenever the application is deployed to the development environment.
- Visibility: It should be easy for authorized users to view both the customization points (i.e. which parts of the application can be customized) and the values assigned to them in a given environment.
- Fail-safety: If an application is deployed with missing or invalid values for customization points (e.g. forgetting to set a timeout or using the test endpoint for the production environment) this should be detected quickly. Preferably, missing or incorrect values would be detected at deployment-time, rather than becoming apparent only due to spurious runtime behaviour.
- Revisioning and access control: Only appropriate users should be able to view and edit the values assigned to an application's customization points. Preferably, a history of changes to these values would be maintained. This aids the comparison of the values for an application across versions, as well as allowing users to compare the values for the same version of a deployed application across target environments.
Irrespective of the precise technology involved, customization approaches break down into two main families: Pointer-based and token-based customization.
Pointer-based replacement is just a fancy way of referring to "search-and-replace" and its slightly more advanced cousin, XPath-based replacement, commonly found in portal or ESB environments. This is fragile because the deployment package contains valid values, so there is a strong risk of silent failure. Further, the customization points of the package are essentially invisible. Ironically, this can be useful in order to "patch" packages that were not written in a customizable way.
It is generally true that it is much easier to prepare a deployment package for pointer-based customization – it is simply a matter of exporting, or making a snapshot of, the settings and artifacts of the development or other "authoring" environment. Of course, such an export can be "tokenized" (by replacing the values that need to be customizable with tokens), but this is error-prone and can involve prohibitive manual overhead, especially a problem during development when such exports are made frequently.
Obviously, pointer-based customization is only possible if the artifacts or definitions that need to be customized are structured in some way – otherwise, it is not possible to construct a pointer to refer to the item to be modified. XML and resource definitions (i.e. properties files) fall into this category, but e.g. plain text files do not.
Token-based replacement, on the other hand, basically means placeholders – the deployment package contains (in artifacts, resource definitions etc.) special symbols, and at deployment time these symbols are replaced by values supplied for them. Token-based customization requires the provider, usually developers, to prepare the deployment artifacts specifically.
This means that, firstly, all the tokens that need to be replaced are known at the time of delivery, which has the added advantage that the application now has a well-defined set of customization points. The special syntax of tokens also makes it possible to verify that values for all the tokens have been supplied.
Even if this verification fails, the application will usually break due to syntax errors – tokens are usually not valid values – which provides an extra "fail-safe" mechanism.
Of course, from a developer's perspective this fail-safe mechanism can be a nuisance, too. Since the application doesn't work until the tokens have been replaced, build processes have to be set up to carry this out, and quickly, certainly in development environments.
Whilst pointer-based replacement is convenient and can be used even with applications that were not designed to be customized, token-based replacement offers significant advantages in terms of fail-safety and visibility. In addition, token-based replacement makes it easy to separate knowledge of where an application needs to be customized (which developers then should know and specify) from the customization values for a specific environment (think production database passwords) that only Operations staff should have access to.
Since deployments are often mission-critical, these are substantial benefits and should lead to tokens being preferred wherever possible.
In fact, an application's customization points should be listed in the package's manifest. Given the special syntax of the tokens, it is usually possible (and advisable) to derive this list by inspecting the package contents. At deployment time, the list can also be verified in a similar manner, providing an additional safeguard.
No DTAP Environment Specific Packages Please
Anyone who has attempted, mid-deployment, to follow vague instructions on what to replace with what in which configuration files in order to get the application to run has probably wondered why this cannot be done in a less error-prone manner. If a suitable deployment automation solution is not in place, a common alternative is to try to carry out customization as part of the build and release process.
From a technical perspective, this can certainly look like an easy option: there are many good continuous build and release products out there, open-source and commercial, and most organizations already have one in place. Many of the tools support the notion of "profiles" or similar mechanisms for varying build details, and if not they all offer hooks to add in your own search-and-replace functionality.
However, there are critical disadvantages that should lead to this approach being regarded as unsuitable for proper customization:
- Environment-specific details need to be accessible during the build process. This shouldn't simple feel "wrong", some of this information can be highly sensitive - think passwords for the production database - making this solution infeasible from a security perspective.
- Continuous build tools generally do not provide suitable versioned, secured etc. repositories for the environment specific values, and env.properties files are notoriously prone to copy-paste and other errors.
- With environment-specific builds, it is almost inevitable that at some point you will have the test build of your application running, by accident, on the production environment.
A smooth and reliable deployment process starts with the first step: putting together and delivering a structured, complete deployment package that defines all the components, configurations and dependencies of the application version in an automatically inspectable, verifiable manner. This dramatically reduces errors due to invalid or missing settings, components or required services.
Machine-readable packages are also the first step to introducing deployment automation that not only inspects and verifies, but also deploys these packages according to your defined standard process, applying environment-specific customizations along the way. Apart from ensuring consistence and improving reliability, this dramatically improves throughput and can remove one of the most common bottlenecks in the development and maintenance lifecycles.
About the Author
An early believer in the ability of Java to deliver "enterprise-grade" software, Andrew Phillips quickly focused on the development of high-throughput, resilient and scalable J2EE applications. Specializing in concurrency and high performance development, Andrew gained substantial experience of the intricacies, complexity and challenges of enterprise application environments while working for a succession of multinationals. Continuously focused on effectively integrating promising new developments in the Java space into corporate software development, Andrew joined XebiaLabs in March 2009, where he is a member of the development team of their deployment automation product Deployit. Amongst others, he also contributes to Multiverse, an open-source Java STM implementation, and jclouds, a leading Java cloud library.
 Hamilton, James, "On Designing and Deploying Internet-Scale Services", Proceedings of the 21st Large Installation System Administration Conference (LISA '07) pp. 231-242
 Phillips, Andrew, "Customize This: Tailoring deployment packages to your target environments"
 Kumar Yadav, Vivek, "Structuring a Deployment Package, part 1: Understanding the complexity"
 Partington, Vincent, "Incremental deployments vs. full redeployments"
 van Loghem, Robert, "So what is a deployment really?"