InfoQ Homepage Articles The Definitive Guide to Database Version Control

The Definitive Guide to Database Version Control

Dec 04, 2014 13 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

In the brave new world of big data and business intelligence, the only technology constant is change, especially when it comes to the database. Almost daily changes in business needs due to demographics, increased service delivery demand and the regulatory environment drive database change. So when it comes to database change, agility through automation - the ability to do more with less more rapidly to accelerate delivery – is what differentiates highly competitive, world-class enterprises from the rest of the crowd.

If your competitor can deliver relevant features faster, with better quality, you will lose market share. Agile development arose from the need to move more quickly to deal with ever-changing requirements, ensuring optimum quality despite resource constraints.

The big release approach is obsolete – waiting six months until the next rollout or release is self-defeating. Agile development reduces release scope to complete each change faster, minimizing the impact of each change. Agility is expected from tech companies and IT divisions to support changing business needs.

The next logical step is to link development with operations – to adopt DevOps.

To master Agile sprint deployments effectively with DevOps, you need the ability to implement deployment and process automation internally within development and QA, or to production. Otherwise, deployments and releases will require manual steps and processes, which are prone to human error and cannot be frequently replicated.

The automation required relies on a version control repository that manages all software assets ready to be built and deployed to the next environment.

The build process starts by cleaning the working space and getting the relevant files from the version control repository. This critical phase prevents out-of-process changes. These changes can still happen though they can be avoided if developers save their changes directly in the build server working space, instead of checking-in the changes to the version control repository. This example may sound absurd because developers know if they do so, their changes will be lost, as the technology enforces the process. This phase also prevents the build phase from accepting work-in-progress changes by referring to only those changes that were submitted to the version control repository in a check-in process. The version control repository acts as the single source of truth.

The Database is a Key Component

Most IT applications today have many components using different technologies; mobile, ASP, PHP, application servers, Citrix, databases, etc. These must all be in sync for the application to work. If, for example, a new column was added to a table, or a new parameter was added to a stored procedure, all other application components must be synchronized to the structure change to function correctly. If this synchronization breaks, the application can fail by calling the wrong parameters to the stored procedure, or by trying to insert data without the new column.

The unique properties of the database component differentiate it from other components:

A database is more than just SQL scripts. It has a table structure, code written in the database language within stored procedures, content saved in reference tables or configuration tables, and dependencies between objects.
A database is a central resource. Several developers can work on the same object, so their work must be synchronized to prevent code overrides.
Deploying database changes is not as simple as copying and replacing old binaries. Database deployment transforms version A into version B while keeping business data and transferring it to the new structure.
Database code exists in any database, and can be directly modified in any environment. This is in contrast to other components, where everything starts from a clean workspace in the build server.

Must-Have Requirements

Several challenges must be addressed when managing database changes. You must:

Ensure all database code is covered (structure, code, reference content, grants)
Ensure the version control repository acts as the single source of truth
Ensure the deployment script being executed knows environment status when the script is executing
Ensure the deployment script handles and merges conflicts
Generate a deployment script for only relevant changes
Ensure the deployment script knows the database’s dependencies

There are four common approaches to managing database changes in development and deploying them internally (Dev, QA) or to the production environment.

Utilizing SQL alter scripts generated during development
Utilizing a changelog activities tracking system
Utilizing simple compare & sync
Utilizing a database enforced change management solution

Utilizing SQL Alter Scripts Generated During Development

The most basic method for managing database changes is to save the alter command in a script or set of scripts, and manage them in the exiting file-based version control. This guarantees a single repository that stores all the application component assets. Developers have the same functionality when checking-in changes for the database as they do when they check-in changes for .NET or Java, such as linking the change to the reason (CR, defect#, user story, etc.). Almost any modern file-based version control solution has a merge notification when several developers change the same file.

But let's see if this solution actually overcomes the challenges for the database, and avoids the potential pitfalls:

Ensures all database code is covered – since the developer or DBA writes the script, they can ensure it will handle all database code.
Ensures the version control repository acts as the single source of truth - not really, as the developer/DBA can login directly to the database (in any environment) and make changes directly in the database.

Manually-Written SQL Scripts

Changes made to the deployment scripts as part of scope changes, branch merges, or re-works are done manually and require additional testing.

Two sets of scripts must be maintained - the create script and the alter script for the specific change for the release. Having two sets of scripts for the same change is a recipe for disaster.

Ensures the deployment script being executed knows the environment status when the script is executing – this depends on the developer and how the script is written. If the script just contains the relevant alter command, then it is not aware of the environment status when it is executed. This means it may try to add the column although it already exists. Writing scripts that will be aware of the environment status at execution time significantly complicates script development.
Ensures the deployment script handles conflicts and merges them – although the file-based version control provides the ability to merge conflicts, this is irrelevant to the database as the version control repository is not 100% accurate and cannot be the single source of truth. The script might override a hot fix performed by another team, leaving no evidence that something went wrong.
Generates deployment scripts for only relevant changes – scripts are generated as part of development. Ensuring the scripts include only relevant and authorized changes – based on the tasks being approved - requires changing the script(s), which creates more risk to the deployment and wastes time.
Ensures the deployment script knows the database dependencies – developers must be aware of database dependencies during the development of the script. If a single script is being used, then the change is usually being appended. This can result in many changes to the same objects. If many scripts are being used, then the order of the scripts is critical and is maintained manually.

Bottom line: not only does this basic approach fail to solve the database challenges, it's also error-prone, time consuming, and requires an additional system to keep track of the scripts being executed.

Utilizing a Changelog Activities Tracking System

Another common approach is to use XML files, which use an abstract language for the change and keep track of the execution. The most common open source solution for this is Liquibase.

With XML files, Liquibase separates the logical change from the physical change, and allows the developer to write the change without knowing the database-specific command. At execution time, it converts the XML to the specific RDBMS language to perform the change. Changes are grouped into a changelog, and they can be in a single XML file or many XML files referred by a major XML file which contains the order of the changes.

The XML files can be saved using the existing file-based version control, which offers the same benefits as the basic approach. In addition, based on the Liquibase execution track, it knows which changelog(s) have already been deployed and shouldn't run again, and which were not yet deployed and should be deployed.

Let's see if Liquibase answers the challenges:

Ensures all database code is covered – managing changes to reference content is not supported in the XML files used by Liquibase, and must be handled as an external addition, which can result in changes being forgotten.
Ensures the version control repository acts as the single source of truth – Liquibase doesn't have any version control functionality. It depends on third-party version control tools to manage XML files, so you have the same challenges in making sure the file-based version control repository reflects the version that was tested. The process that will ensure the version control repository can be the single source of truth requires developers to check-in changes in order to test them. This can result in work-in-progress changes being deployed to next environment.
Ensures the deployment script being executed knows the environment status when the script is executing – Liquibase knows which changelogs have been deployed and will not execute them again. However, if the logical change is to add a date column that exists in varchar format, the deployment will fail. Also, overrides of out-of-process changes cannot be prevented.
Ensures the deployment script handles conflicts and merges them – any change being made to the database outside of Liquibase can cause a conflict, which will not be handled by Liquibase.

Out of process changes are not handled

Generates deployment scripts for only relevant changes – changes can be skipped at the changelog level, but breaking a changelog into several changelogs requires writing a new XML file, thus requiring more tests.
Ensures the deployment script knows the database dependencies – the order of the changes is maintained manually during the development of the changelog XML.

Bottom line: using a system that tracks change execution does not address the challenges associated with database development and, as a result, does not meet the deployment requirements.

Utilizing Simple Compare & Sync

Another common approach is to generate the database change script automatically by comparing the source environment (development) to the target environment (test, UAT, Production, etc.). This saves the developers and DBAs time because they don't have to manually maintain the script if it is a create script or an alter script for the release. Scripts can be generated when needed, and refer to the current structure of the target environment.

Let's review the challenges and see if this approach overcomes them:

Ensures all database code is covered – most compare & sync tools know how to handle the different database objects, but only a few have the functionality to handle compare & sync of the reference data.
Ensures the version control repository acts as the single source of truth - simple compare & sync does not utilize the version control repository when performing the compare and generating the merge script.
Ensures the deployment script being executed knows the environment status when the script is executing – the best practice is to generate the script just before executing it so it will refer the current environment status.
Ensures the deployment script handles conflicts and merges them – simple compare & sync tools compare A to B (source to target). Based on the simple table at the right, the tool then generates a script to "upgrade" the target to match the source. Without knowing the nature of the change, the wrong script can be generated. For example, there is an index in the target that was created from a different branch or critical fix. If this index does not exist in the source environment, what should the tool do? Drop the index? If there is an index in development, but not in production, was it added in development? Dropped in production? Using such a solution requires deep knowledge of each change to make sure they are handled properly.
Generates deployment scripts for only relevant changes – the compare & sync tools compare the entire schema and show the differences. They are unaware of the reason behind the changes, as this information is stored in the ALM, CMS, or version control repository, which is external to the compare & sync tool. You might get a lot of background noise, making it difficult to determine what you actually need to deal with.
Ensures the deployment script knows the database dependencies – compare & sync tools are aware of database dependencies and generate the relevant DDLs, DCLs, and DMLs in the correct order. Not all compare & sync tools support generating a script that contains objects from several schemas.

Bottom line: compare & sync tools satisfy some of the must-have requirements, but fail to deal with others. Scripts must be manually reviewed, and cannot be trusted in an automated process.

Utilizing a Database Enforced Change Management Solution

Database enforced change management combines enforcement of version control processes on database objects with generation of the deployment script when required, based on the version control repository and the structure of the environment at that time.

This approach uses “build and deploy on-demand,” meaning the deploy script is built (generated) when needed, not as part of development. This allows for efficient handling of conflicts, merges, and out-of-process changes.

Build & Deploy On-Demand

How does database enforced change management handle the challenges?

Ensures all database code is covered – structure, business logic written in the database language, reference content, database permissions and more are managed.
Ensures the version control repository acts as the single source of truth – the enforced change policy prevents anyone using any IDE (even command line) from modifying database objects that were not checked-out before and checked-in after the change. This guarantees the version control repository will always be in sync with the definition of the object at check-in time.

Single Process Enforcing Version Control

Ensures the deployment script being executed knows the environment status when the script is executing – building (generating) the deployment script when needed (just before executing) guarantees it knows the current environment status.
Ensures the deployment script handles conflicts and merges them – by using baselines in the analysis, the nature of the change is known and the correct decision whether to promote the change, protect the target (ignore the change), or merge a conflict is easy.

Baseline Aware Analysis

Generates deployment scripts for only relevant changes – the integration with application lifecycle management (ALM) and change management systems (CMS) enables you to assign a reason to the change, as is done in the file-based version control or task management system.
Ensures the deployment script knows the database dependencies – the sophisticated analysis and script generation algorithm ensures the DDLs, DCLs, and DMLs will be executed in the correct order based on the database dependencies, including inter-schema dependencies.

In addition to the must-have requirements, there are other requirements, such as supporting parallel development, merging branches, integrating with the database IDE, and supporting changes originating from data modeling tools. You must verify that these requirements will be addressed by whichever method you choose.

Bottom Line

The database component has special requirements, and therefore creates a real challenge for automation processes. In the old days when there were only a few releases per year, it was common and understandable to invest time manually reviewing and maintaining the database deployment scripts. Today, with the growing need to be agile and provide releases faster, the database must be part of the automation process. Developing SQL or XML scripts, or using simple compare & sync are either inefficient and/or risky approaches when it comes to automation. The most effective method is to implement database enforce change management.

About The Author

Uri Margalit is the Director of Product Management at DBmaestro, an Enterprise Software Development Company focusing on database development and deployment technologies. Uri has over 15 years’ experience in enterprise software and systems management and has held senior Product Management and R&D Management positions at a number of leading software companies.

InfoQ Software Architects' Newsletter

The Definitive Guide to Database Version Control

Write for InfoQ

The Database is a Key Component

Must-Have Requirements

Utilizing SQL Alter Scripts Generated During Development

Utilizing a Changelog Activities Tracking System

Utilizing Simple Compare & Sync

Utilizing a Database Enforced Change Management Solution

Bottom Line

About The Author

Rate this Article

This content is in the Database topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter