Key Takeaways
- Databases, although different from applications, can and should be included in the same development process as applications. We call this database shift left.
- When all database changes are described with scripts, source control can be applied and the database development process can include continuous integration and continuous delivery activities, namely taking part in the deployment pipeline.
- Automation is the special ingredient that powers up the deployment pipeline making it repeatable, reliable and fast, reducing fear for database changes.
- Migrations-based and state-based are two different approaches to describing a database change. Independently of the choice, small batches are always a good choice.
- The deployment pipeline is a technical and cultural tool where DevOps values and practices should be reflected according to the needs of each organization.
Why and how database changes should be included in the deployment pipeline
I’ve spoken several times on how databases and applications should coexist in the same deployment pipeline. In other words, they should share a single development lifecycle. Database development should not be handled differently from application development.
A database change should be subject to the same activities (source control, continuous integration and continuous delivery) as application changes:
Disclaimer: both Continuous Integration (CI) and Continuous Delivery (CD) are sets of principles and practices that go beyond build, test and deploy. The use of these terms throughout this article is an assumed oversimplification with the purpose of seeing CI and CD as goals to be achieved making use of the deployment pipeline (but not restricted to that alone).
What's so special about databases?
The answer is simple, right? Data! Data must be persisted before, during and after the deployment. Rolling back a database is more risky than rolling back an application. If your application is down or buggy you will end up annoying some of your customers and possibly lose money. But if you lose data, you will also lose your reputation and customer's trust.
The fear of database changes is the main reason that today database and application. Changes are handled differently in most organizations. Leading to two common scenarios:
- Database changes are not included in the deployment pipeline
- Database changes have a different deployment process
By not including the database in the pipeline most of the work related to database changes ends up being manual, with the associated costs and risks. On top of that this:
- Results in a lack of traceability of database changes (changes history);
- Prevents applying Continuous Integration (CI) and Continuous Delivery (CD) good practices to a full extent;
- Promotes fear of changes.
Sometimes database and application have independent pipelines which means that the development team and the DBA team need to keep synchronizing. This requires a lot of communication which leads to misunderstandings and mistakes.
The end result is that databases become a bottleneck in an agile delivery process. To fix this problem database changes need to become trivial, normal and comprehensible. This is done by storing changes in source control, automating deployment steps, and sharing knowledge and responsibility between the development teams and the DBA team. DBAs should not be the last stronghold of the database but the first person to be consulted when architecting a database change. Developers should understand that processing data is not the same as storing data and get used to reason about both to justify the changes they make to the database.
The value of automation
Automation enables control over database development by making the deployment process repeatable, reliable and consistent. Another perspective is that automation enables failing (because failures will inevitably happen) in a controlled and known way. So, instead of recurrently having to (re)learn manual work, we better improve and rely on automation.
Additionally, automation provides the following bonus:
- Removes/reduces human intervention;
- Increases speed of response to change.
Where to start with automation? Any place is good to start, since this decision depends highly on your chosen approach. However, it’s a good idea to start from left to right of the deployment pipeline and take baby steps:
- The first step is to describe the database change using a script. Normally this step is done by a human.
- The next step is to build and validate an artifact (a Zip or NuGet package for example) containing the database changes. This step should be fully automated. In others words, we need to use scripts for building the package, running unit/integration tests, and sending the package to an artifactory management system and/or to a deployment server/service. Normally, build servers already include these steps by default.
- The last step would be to automate the deployment of the database changes in the target server/database. As in the previous step, you can use the deployment server built-in options or build your own customized scripts.
Source control
Every database change can be described with an SQL script, meaning that we can store them in source control, for example in a git repository.
CREATE TABLE [dbo].[Member](
[MemberId] [INT] IDENTITY(1,1) NOT NULL,
[MemberFirstName] [NVARCHAR](50) NULL,
[MemberLastName] [NVARCHAR](50) NULL,
CONSTRAINT [PK_Member] PRIMARY KEY CLUSTERED
(
[MemberId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Source control is the first stage of the deployment pipeline and provides the following capabilities:
- Traceability through change history
- Allows answering questions such as: what has changed? When? By who? Why?
- SQL as documentation
- SQL scripts are the best way to document the changes made in the database.
- Shared code-base and shared process
- By using a centralized or distributed mechanism for sharing the database code (SQL scripts) across the organization, the process for developing and deploying database changes is also being shared. This supports sharing and learning.
- Enforceable standards to reduce conflicts
- Since the organization is using a shared process to deploy database changes, it becomes easier to apply standards and conventions that reduce conflicts and promote best practices.
Note: this article focus on SQL databases but NoSQL databases still face the same change management challenges (even if the approaches differ as schema validation is not possible on NoSQL). The common approach for NoSQL databases is to make the application (and development team) responsible for managing change, possibly at runtime rather than at deploy time.
After version controlling database changes, another question arises: should we use a state based or a migrations based approach to describe database changes?
State based approach
The script represents how the database should be (from scratch), not the commands needed to migrate from current database state. This is known as a declarative or desired state approach.
CREATE TABLE [dbo].[Member](
[MemberId] [INT] IDENTITY(1,1) NOT NULL,
[MemberFirstName] [NVARCHAR](50) NULL,
[MemberLastName] [NVARCHAR](50) NULL,
[MemberEmail] [NVARCHAR](50) NULL,
CONSTRAINT [PK_Member] PRIMARY KEY CLUSTERED
(
[MemberId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Migrations based approach
The script represents how the database should change (from its current state). This is known as an imperative approach.
ALTER TABLE [dbo].[Member]
ADD[MemberEmail] [NVARCHAR](50) NULL
Both state and migrations approaches have advantages and disadvantages, and have different implications on the CI and CD stages of the pipeline.
In the state approach the change script is auto generated by a state comparator (looks at two different states and generates the delta script). In the migrations approach the change script is created by a human.
The state approach is usually faster in terms of change script creation/generation and the scripts are automatically created following best practices. However, the delta script generation is done at delivery time, as it needs to compare the state of the target database with the desired state script coming from source control. This means there should be a manual step for reviewing the generated change script.
In the migrations approach the quality of the change script fully depends on the SQL skills of the script creator. However, this approach is more powerful because the script creator knows the context of the change, and since the change is described in an imperative way it can be peer reviewed early in the process.
State and migration approaches are not necessarily mutually exclusive. In fact, both approaches are useful and necessary at differents moments in time. However, combining both approaches in the same deployment pipeline can become very hard to manage.
Scripting guidelines
Independently of the approach, the golden rule is to use small batches, i.e. keep changes as small as possible.
This list is from a real world project using a migrations approach. Migrations were managed by Flyway and we sticked to a simple rule: one script, one operation type, one object. This helped enforce small migration scripts which significantly reduced the number and complexity of merge conflicts.
Continuous Integration and Continuous Delivery
After achieving source control for database changes, the second and third stages of the deployment pipeline are unlocked. We can now apply CI and CD practices as well.
Continuous Integration
This is when database changes are integrated and validated through testing:
- What should be tested? (unit, integration)
- When should they be tested? (before, during and after deployment)
Once again, by using small batches the risk associated with the changes is greatly reduced.
This stage is highly dependent on the way business logic is managed between application and database.
Continuous Delivery
This stage focuses on delivering database changes into the target environment.
Metrics to take into account:
- Deployment downtime
- Time that the system needs to be offline (for example to make a backup)
- Time to recover
- Time to recover from a failed deployment (for example time to restore a backup)
- Affected applications
- When sharing a database between different applications we are increasing the blast radiu on the system when the database becomes unavailable.
Once again, the risks associated to database delivery can be reduced with small batches.
Deployment pipeline scenarios
Scenario 1 - Fully independent pipelines
This is a common scenario for organizations starting with the process of database automation/integration with application development. The first instinct is to build a separate pipeline dedicated to database, in order to avoid impacting the existing application pipeline.
Database and application have completely independent deployment pipelines: different code repositories, different CI process, different CD process. There is a high need for synchronization between the development team and the DBA team.
In this scenario no knowledge is shared between the different teams involved in the process, instead we see silos being promoted.
It’s possible, however, that a database (or a part of it) lives independently from the application. In other words, the database is not connected to any application and users can establish a direct connection to the database for example. Although this use case is not common, having an independent pipeline for the database would be appropriate then.
Scenario 2 - Delivering application and database together
Taking into account that the main goal is to keep database and application changes in sync, this scenario is the logical next step.
Database and application code live in different repositories and have independent CI processes (deployment packages get created separately), but share the same CD process. Delivery is the contact point between database and application.
Despite the synchronization improvement when comparing to scenario 1, this does not yet ensure synchronization between the correct versions of the database and the application. It only guarantees the latest versions of both get delivered together.
Scenario 3 - Building and delivering application and database together
This is an “exotic” scenario that normally represents a short learning step towards scenario 4.
Database and application code still live in different repositories but the CI process is shared to some extent (for example, the application CI process only starts after the database CI process finishes successfully). The success of each part is related/connected.
Scenario 4 - Fully integrated pipeline
Database and application code live in the same repository, and share the same CI and CD processes.
One might see it as this scenario not requiring synchronization, but in fact what takes place is a continuous synchronization process that starts in source control and continues with code reviews of the database scripts.
Once database and application share the same deployment pipeline every stage/step of the pipeline can be seen as a learning opportunity for all organization members.
It’s worthwhile to mention that achieving synchronization between database and application does not prevent database maintenance and/or optimization operations occurring. The deployment pipeline should support deployments where either only the database or only the application needs to change.
Deployment pipeline & DevOps
The deployment pipeline is a cultural and technical tool for managing changes in the software development process and should include databases, applications and infrastructure.
Following the same deployment process for databases as well as applications aligns well with the Gene Kim’s “three ways”, the principles underpinning DevOps, because it:
- Increases flow visibility
- More visibility over the system, leading to a better understanding of how to safely and quickly change the system.
- Increases feedback
- More learning opportunities at database and application levels together.
- Increases knowledge
- Sharing knowledge through the pipeline allows the organization to build a learning system, avoiding extensive documentation (often a source of waste) and reducing the truck factor (relying in a system instead of relying on heroes).
The deployment pipeline is your software factory, the better your factory, the better your produced software.
About the Author
Eduardo Piairo is a deployment pipeline craftsman always ready to learn about source control, continuous integration and continuous delivery for databases , applications and infrastructure. The deployment pipeline is his favorite technical and cultural tool. Works at Basecone as Operations Engineer with automation, collaboration and communication as priorities.
Community comments
About Data Migration
by Alvaro Camillo Neto,
Re: About Data Migration
by Eduardo Piairo,
Re: About Data Migration
by Alvaro Camillo Neto,
State based DB upgrade is not really an option
by Mileta Cekovic,
Re: State based DB upgrade is not really an option
by Richard Clayton,
Incremental changes
by Will Hartung,
Re: Incremental changes
by Eduardo Piairo,
Re: Incremental changes
by Will Hartung,
Re: Incremental changes
by Shant Stepanian,
Re: Incremental changes
by Enrique Illán,
What about testing?
by Don Stinchfield,
Re: What about testing?
by Eduardo Piairo,
Which of the pipeline options described is the best in your opinion?
by Miguel Domingos,
Re: Which of the pipeline options described is the best in your opinion?
by Eduardo Piairo,
Handling state-based and incremental changes together
by Shant Stepanian,
Re: Handling state-based and incremental changes together
by Eduardo Piairo,
Elevated DB privileges needed for DDL
by Brad Allan,
Old and new software working on database under migration
by Hans Kappert,
Re: Old and new software working on database under migration
by daniel carter,
About Data Migration
by Alvaro Camillo Neto,
Your message is awaiting moderation. Thank you for participating in the discussion.
Thanks for this great article!
I have a question.
How at State based approach can i migrate the data?
I mean, i have a application with a old database with a lot of data. How can i use the State based approach in this case?
Thanks!
Re: About Data Migration
by Eduardo Piairo,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hello Alvaro,
From the state based tools that I know you can control "static data" - data that do not change frequently and that data needs to be in table with a valid primary key - since the state approach is based in comparing two different states and generating the difference between them (that's why you need a key as reference). In other words, this approach (and tools related) do not have data migration as main purpose.
As alternative you could use database backups (check dbatools, it can help with his) for full migration, or data scripts (Management Studio can generate this) for a partial migration.
Regards.
Re: About Data Migration
by Alvaro Camillo Neto,
Your message is awaiting moderation. Thank you for participating in the discussion.
Thanks!
State based DB upgrade is not really an option
by Mileta Cekovic,
Your message is awaiting moderation. Thank you for participating in the discussion.
State based approach is IMHO for simple databases, hello world applications and DevOps courses.
Any serious enterprise application will need migrations based approach, combined will full installation scripts.
Re: State based DB upgrade is not really an option
by Richard Clayton,
Your message is awaiting moderation. Thank you for participating in the discussion.
I agree - IDK of any tool that implements the declarative approach that would work with a complex database.
Incremental changes
by Will Hartung,
Your message is awaiting moderation. Thank you for participating in the discussion.
We (and we're not alone in this) simply have a file containing incremental db changes. Each time the application is started, it runs through the changes and applies any that the current database has not seen before (there's a simple table in the db which records what changes were seen when).
Philosophically, once a change has been made to this list and committed, we are not allowed to change it in situ. If there was a mistake made, we have to create another incremental change to correct the problem. Simply, once you commit, you don't know who else may have built and deployed the application and applied the change.
We also have the facility to specify a Java class that will be passed a DB connection and can "do whatever it wants". We use those for special cases, one use case was to use it to update LDAP data stores.
For stored procedures (which don't "incremental" change well, at least from a source level), it's typically ok to "delete the old one" and add in a new one, with a different tag. These mostly just stomp on the old one anyway. If there's some dependency on a SP earlier in the change list then, you know, "don't do that".
This has worked for 99% of the use cases, and we use it in production. If we have a significant enough change that warrants special handling, we'll just handle those on a case by case basis.
What about testing?
by Don Stinchfield,
Your message is awaiting moderation. Thank you for participating in the discussion.
A key part of our CICD process is testing. Where do you see unit and integration testing fitting into the automated process?
Re: Incremental changes
by Eduardo Piairo,
Your message is awaiting moderation. Thank you for participating in the discussion.
Thank you for sharing Will.
That's the standard approach for migrations based solutions: you have a "list of changes" and a metadata table in the database with information about that changes. Flyway do the migrations management work for me: make sure that a migration just runs one time only in a specific order (plus some extra bonus: migrations are encapsulated in an transaction, ...).
I also started with a simple file containing all database changes, but when you have a database shared by different applications, in other words, different teams changing the database, the merging process can be a nightmare. So I switched the approach for small migrations scripts (small as possible) trying to reduce the conflict surface improving the merging process.
The curious thing about starting to make small migrations was, in our case because we did a lot of database changes on a daily basis, we started to see patterns in the way we changed the database - so some database changes (migration script creation) where automated.
Re: What about testing?
by Eduardo Piairo,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hello Don,
In 99% of the discussions about databases the correct answer is: it depends.
Should database be tested - 99% yes. If your database have business logic or influences the application behavior - definitely yes.
But the question is more: what and when should be tested. Unit and integration tests can be included in the build process using frameworks like tSQLt.
I see database tests as a contract between database and applications - both parts trying not to break the contract.
Re: Incremental changes
by Will Hartung,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yea we don't (normally) have different applications sharing the same database, so that certainly eliminates a source of contention. (We do actually have this issue, in a very limited sense, but we manage those edges manually).
But the key point, for us, is that this is a 95% solution. Its works most of the time, but it's tempting to pile on complexity for the edge cases. For us, we're trying to smooth out the update and installation process, along with the developers lives. But there's still room for special cases and moments of handholding.
For example, when we originally did the work, we had the capability of "undoing" things, etc. We never used it in anger. It's not worth it. We can "reverse engineer" a database. We never use that either. Similarly, we considered using some "meta" language from which we created the DDL scripts, but since most of the time we were single database, that was a lot of work for no gain. Instead of converting everything in to, say, XML tags, we replaced it with some simple "macro" arguments to handle things like sequence and key definitions. We have, I guess, 4 of these special cases that are handled.
Simply we added several "not going to need it features" that in the end, amazing, we didn't need.
Now, for the majority of the use cases, we can simply cut and paste the SQL in to the file, add a bit of XML wrapper, and be done with it.
Oh, and the fact that DDL is wrapped in a transaction in Postgres is a wonderful thing for tools like this.
Which of the pipeline options described is the best in your opinion?
by Miguel Domingos,
Your message is awaiting moderation. Thank you for participating in the discussion.
Option 1, 2, 3 or 4 and why?
Handling state-based and incremental changes together
by Shant Stepanian,
Your message is awaiting moderation. Thank you for participating in the discussion.
Good overview of the database deploy progress and the various issues to overcome with it!
Wanted to mention that I wrote about a similar topic on InfoQ a few months back, specifically on a database deployment tool Obevo that we open-sourced.
We recognized the difficulty of managing database objects, particularly between stateful objects like tables and stateless objects that can be recreated like views and procedures, and we feel that our tool has an approach that works pretty well for both cases that comes very close to the state based model that you described. We've battle-tested this in our enterprise setting, many systems with hundreds or thousands of objects to manage. In this case, we felt that managing a file per migration can lead to a very unwieldy code base, whereas our object-based file maintenance approach is a more maintainable methodology that can support both incremental patches and full deployments.
Would like to hear what you think
Re: Incremental changes
by Shant Stepanian,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hi Will, am interested to hear about your use case regarding stored procedures. We have hit that as well, notably in a couple cases: 1) needing to deploy hundreds of stored procedures (in the right order) 2) having to modify a couple stored procedures that were previously deployed (in the right order).
We described our approach in my InfoQ article (also linked in another comment here; apologies for the spam). This was a problem that we faced quite often (many such use cases in my company), and so we aimed to always solve this without any manual intervention.
Would be interested in your thoughts on this, and how that would compare to the use cases you've seen.
Re: Which of the pipeline options described is the best in your opinion?
by Eduardo Piairo,
Your message is awaiting moderation. Thank you for participating in the discussion.
It all depends on the use case and the purpose. If your purpose is to improve the synchronization between database and application, the scenario 4 will help you on this mission by reducing the communication need outside the deployment pipeline. Since most of the problemas are related with communication/collaboration (sharing information between teams for example) the scenario 4 it's my favorite approach.
Re: Handling state-based and incremental changes together
by Eduardo Piairo,
Your message is awaiting moderation. Thank you for participating in the discussion.
The object-based script file can be interesting in terms of visualizing the data model. However, the ordering and the dependency management that results from this approach can be extremely complex, especially in shared databases.
I agree that the migrations based approach can lead to big code base, but you can mitigate this by tuning the granularity of the migration (not my favorite approach, I like migrations scripts as small as possible - this reduce the possible friction surface) or setting up a rebase policy.
Re: Incremental changes
by Enrique Illán,
Your message is awaiting moderation. Thank you for participating in the discussion.
Interesting, but what happen when you have load balancing? How do you manage which node has to do the change? And also what about bad script handling? Developers by mistake could badly update something in the database or maybe worse. How are you going to handle that?
Elevated DB privileges needed for DDL
by Brad Allan,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hi Eduardo, thanks for the article. One question arising for me is about the elevated privileges the deployment pipeline needs to perform DDL on a database?
I guess the pipeline should:
- use a secrets vault to retrieve credentials of a db-owner
- check the db-owner user is still valid on the database and fail the pipeline if not
Thoughts?
Old and new software working on database under migration
by Hans Kappert,
Your message is awaiting moderation. Thank you for participating in the discussion.
Thank you for this article. Suppose you have a web application running in a swarm of docker containers and you want to have zero downtime during the migration to a new version (V2) of the web application: a migration that also involves a breaking change in the database. I mean that as soon as the database is changed, containers running V1 of the website will break. You cannot swap to the new containers in a split second, so you'll always have a moment where V1 and V2 containers are handling requests, and both versions must be able to deal with the old and new version of the database during migration.
What would be the desired approach?
Kind regards,
Hans
Re: Old and new software working on database under migration
by daniel carter,
Your message is awaiting moderation. Thank you for participating in the discussion.
You can't have breaking changes if you want zero downtime. For a breaking change you need to have a period where neither old nor new code is running while you transition the DB structure. Ergo, breaking changes imply downtime.
So for zero downtime you need to be backward compatible. But that backward compatibility only needs to be for one release for the duration of the rollout. Not backwards compatible as in, we are going to support v1.2 of the client for 3 years after release.