InfoQ Homepage Articles Why and How Database Changes Should Be Included in the Deployment Pipeline

DevOps

Why and How Database Changes Should Be Included in the Deployment Pipeline

Jan 30, 2018 10 min read

Follow us on

Youtube232K Followers

Linkedin26K Followers

Key Takeaways

Databases, although different from applications, can and should be included in the same development process as applications. We call this database shift left.
When all database changes are described with scripts, source control can be applied and the database development process can include continuous integration and continuous delivery activities, namely taking part in the deployment pipeline.
Automation is the special ingredient that powers up the deployment pipeline making it repeatable, reliable and fast, reducing fear for database changes.
Migrations-based and state-based are two different approaches to describing a database change. Independently of the choice, small batches are always a good choice.
The deployment pipeline is a technical and cultural tool where DevOps values and practices should be reflected according to the needs of each organization.

Why and how database changes should be included in the deployment pipeline

I’ve spoken several times on how databases and applications should coexist in the same deployment pipeline. In other words, they should share a single development lifecycle. Database development should not be handled differently from application development.

A database change should be subject to the same activities (source control, continuous integration and continuous delivery) as application changes:

Disclaimer: both Continuous Integration (CI) and Continuous Delivery (CD) are sets of principles and practices that go beyond build, test and deploy. The use of these terms throughout this article is an assumed oversimplification with the purpose of seeing CI and CD as goals to be achieved making use of the deployment pipeline (but not restricted to that alone).

What's so special about databases?

The answer is simple, right? Data! Data must be persisted before, during and after the deployment. Rolling back a database is more risky than rolling back an application. If your application is down or buggy you will end up annoying some of your customers and possibly lose money. But if you lose data, you will also lose your reputation and customer's trust.

The fear of database changes is the main reason that today database and application. Changes are handled differently in most organizations. Leading to two common scenarios:

Database changes are not included in the deployment pipeline
Database changes have a different deployment process

By not including the database in the pipeline most of the work related to database changes ends up being manual, with the associated costs and risks. On top of that this:

Results in a lack of traceability of database changes (changes history);
Prevents applying Continuous Integration (CI) and Continuous Delivery (CD) good practices to a full extent;
Promotes fear of changes.

Sometimes database and application have independent pipelines which means that the development team and the DBA team need to keep synchronizing. This requires a lot of communication which leads to misunderstandings and mistakes.

The end result is that databases become a bottleneck in an agile delivery process. To fix this problem database changes need to become trivial, normal and comprehensible. This is done by storing changes in source control, automating deployment steps, and sharing knowledge and responsibility between the development teams and the DBA team. DBAs should not be the last stronghold of the database but the first person to be consulted when architecting a database change. Developers should understand that processing data is not the same as storing data and get used to reason about both to justify the changes they make to the database.

The value of automation

Automation enables control over database development by making the deployment process repeatable, reliable and consistent. Another perspective is that automation enables failing (because failures will inevitably happen) in a controlled and known way. So, instead of recurrently having to (re)learn manual work, we better improve and rely on automation.

Additionally, automation provides the following bonus:

Removes/reduces human intervention;
Increases speed of response to change.

Where to start with automation? Any place is good to start, since this decision depends highly on your chosen approach. However, it’s a good idea to start from left to right of the deployment pipeline and take baby steps:

The first step is to describe the database change using a script. Normally this step is done by a human.
The next step is to build and validate an artifact (a Zip or NuGet package for example) containing the database changes. This step should be fully automated. In others words, we need to use scripts for building the package, running unit/integration tests, and sending the package to an artifactory management system and/or to a deployment server/service. Normally, build servers already include these steps by default.
The last step would be to automate the deployment of the database changes in the target server/database. As in the previous step, you can use the deployment server built-in options or build your own customized scripts.

Source control

Every database change can be described with an SQL script, meaning that we can store them in source control, for example in a git repository.

CREATE TABLE [dbo].[Member](
	[MemberId] [INT] IDENTITY(1,1) NOT NULL,
	[MemberFirstName] [NVARCHAR](50) NULL,
	[MemberLastName] [NVARCHAR](50) NULL,
 CONSTRAINT [PK_Member] PRIMARY KEY CLUSTERED 
(
	[MemberId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Source control is the first stage of the deployment pipeline and provides the following capabilities:

Traceability through change history
- Allows answering questions such as: what has changed? When? By who? Why?
SQL as documentation
- SQL scripts are the best way to document the changes made in the database.
Shared code-base and shared process
- By using a centralized or distributed mechanism for sharing the database code (SQL scripts) across the organization, the process for developing and deploying database changes is also being shared. This supports sharing and learning.
Enforceable standards to reduce conflicts
- Since the organization is using a shared process to deploy database changes, it becomes easier to apply standards and conventions that reduce conflicts and promote best practices.

Note: this article focus on SQL databases but NoSQL databases still face the same change management challenges (even if the approaches differ as schema validation is not possible on NoSQL). The common approach for NoSQL databases is to make the application (and development team) responsible for managing change, possibly at runtime rather than at deploy time.

After version controlling database changes, another question arises: should we use a state based or a migrations based approach to describe database changes?

State based approach

The script represents how the database should be (from scratch), not the commands needed to migrate from current database state. This is known as a declarative or desired state approach.

CREATE TABLE [dbo].[Member](
	[MemberId] [INT] IDENTITY(1,1) NOT NULL,
	[MemberFirstName] [NVARCHAR](50) NULL,
	[MemberLastName] [NVARCHAR](50) NULL,
	[MemberEmail] [NVARCHAR](50) NULL,
 CONSTRAINT [PK_Member] PRIMARY KEY CLUSTERED 
(
	[MemberId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Migrations based approach

The script represents how the database should change (from its current state). This is known as an imperative approach.

ALTER TABLE [dbo].[Member]
ADD[MemberEmail] [NVARCHAR](50) NULL

Both state and migrations approaches have advantages and disadvantages, and have different implications on the CI and CD stages of the pipeline.

In the state approach the change script is auto generated by a state comparator (looks at two different states and generates the delta script). In the migrations approach the change script is created by a human.

The state approach is usually faster in terms of change script creation/generation and the scripts are automatically created following best practices. However, the delta script generation is done at delivery time, as it needs to compare the state of the target database with the desired state script coming from source control. This means there should be a manual step for reviewing the generated change script.

In the migrations approach the quality of the change script fully depends on the SQL skills of the script creator. However, this approach is more powerful because the script creator knows the context of the change, and since the change is described in an imperative way it can be peer reviewed early in the process.

State and migration approaches are not necessarily mutually exclusive. In fact, both approaches are useful and necessary at differents moments in time. However, combining both approaches in the same deployment pipeline can become very hard to manage.

Scripting guidelines

Independently of the approach, the golden rule is to use small batches, i.e. keep changes as small as possible.

This list is from a real world project using a migrations approach. Migrations were managed by Flyway and we sticked to a simple rule: one script, one operation type, one object. This helped enforce small migration scripts which significantly reduced the number and complexity of merge conflicts.

Continuous Integration and Continuous Delivery

After achieving source control for database changes, the second and third stages of the deployment pipeline are unlocked. We can now apply CI and CD practices as well.

Continuous Integration

This is when database changes are integrated and validated through testing:

What should be tested? (unit, integration)
When should they be tested? (before, during and after deployment)

Once again, by using small batches the risk associated with the changes is greatly reduced.

This stage is highly dependent on the way business logic is managed between application and database.

Continuous Delivery

This stage focuses on delivering database changes into the target environment.

Metrics to take into account:

Deployment downtime
- Time that the system needs to be offline (for example to make a backup)
Time to recover
- Time to recover from a failed deployment (for example time to restore a backup)
Affected applications
- When sharing a database between different applications we are increasing the blast radiu on the system when the database becomes unavailable.

Once again, the risks associated to database delivery can be reduced with small batches.

Deployment pipeline scenarios

Scenario 1 - Fully independent pipelines

This is a common scenario for organizations starting with the process of database automation/integration with application development. The first instinct is to build a separate pipeline dedicated to database, in order to avoid impacting the existing application pipeline.

Database and application have completely independent deployment pipelines: different code repositories, different CI process, different CD process. There is a high need for synchronization between the development team and the DBA team.

In this scenario no knowledge is shared between the different teams involved in the process, instead we see silos being promoted.

It’s possible, however, that a database (or a part of it) lives independently from the application. In other words, the database is not connected to any application and users can establish a direct connection to the database for example. Although this use case is not common, having an independent pipeline for the database would be appropriate then.

Scenario 2 - Delivering application and database together

Taking into account that the main goal is to keep database and application changes in sync, this scenario is the logical next step.

Database and application code live in different repositories and have independent CI processes (deployment packages get created separately), but share the same CD process. Delivery is the contact point between database and application.

Despite the synchronization improvement when comparing to scenario 1, this does not yet ensure synchronization between the correct versions of the database and the application. It only guarantees the latest versions of both get delivered together.

Scenario 3 - Building and delivering application and database together

This is an “exotic” scenario that normally represents a short learning step towards scenario 4.

Database and application code still live in different repositories but the CI process is shared to some extent (for example, the application CI process only starts after the database CI process finishes successfully). The success of each part is related/connected.

Scenario 4 - Fully integrated pipeline

Database and application code live in the same repository, and share the same CI and CD processes.

One might see it as this scenario not requiring synchronization, but in fact what takes place is a continuous synchronization process that starts in source control and continues with code reviews of the database scripts.

Once database and application share the same deployment pipeline every stage/step of the pipeline can be seen as a learning opportunity for all organization members.

It’s worthwhile to mention that achieving synchronization between database and application does not prevent database maintenance and/or optimization operations occurring. The deployment pipeline should support deployments where either only the database or only the application needs to change.

Deployment pipeline & DevOps

The deployment pipeline is a cultural and technical tool for managing changes in the software development process and should include databases, applications and infrastructure.

Following the same deployment process for databases as well as applications aligns well with the Gene Kim’s “three ways”, the principles underpinning DevOps, because it:

Increases flow visibility
- More visibility over the system, leading to a better understanding of how to safely and quickly change the system.
Increases feedback
- More learning opportunities at database and application levels together.
Increases knowledge
- Sharing knowledge through the pipeline allows the organization to build a learning system, avoiding extensive documentation (often a source of waste) and reducing the truck factor (relying in a system instead of relying on heroes).

The deployment pipeline is your software factory, the better your factory, the better your produced software.

About the Author

Eduardo Piairo is a deployment pipeline craftsman always ready to learn about source control, continuous integration and continuous delivery for databases , applications and infrastructure. The deployment pipeline is his favorite technical and cultural tool. Works at Basecone as Operations Engineer with automation, collaboration and communication as priorities.

InfoQ Software Architects' Newsletter