Facilitating the spread of knowledge and innovation in professional software development

Contribute

### Topics

InfoQ Homepage Articles Database Versioning and Delivery with Upgrade Scripts

# Database Versioning and Delivery with Upgrade Scripts

This item in japanese

## 1. Introduction

Database upgrade is usually one of “last mile” delivery tasks, being frequently left till the end of the project or till the sprint before the release, which is far from perfect because:

• During each software deployment on a test environment the database is often recreated, which means that every time testers lose their test data
• If the project lasts long enough, upgrade scripts are written sometimes months later than the initial database change was made, when the knowledge of how to migrate the data might be lost or impaired
• There is a significant risk of failure as the upgrade scripts have not been tested thoroughly and repeatedly prior to production release
• It is hard to estimate how long it will take to write the upgrade scripts ,adding even more risk to the delivery date and budget

In order to avoid such problems, experience taught me that good database versioning and a delivery strategy is fundamental for most enterprise projects. Here is how we deal with it at Objectivity.

## 2. The Agile Perspective

Our projects are led in an agile manner meaning applications are developed incrementally and iteratively, and the database is included in this software development process. It all starts with “Definition of Done” (DoD), which is crucial to every highly functioning team. User Story-level DoD should contain a “Deployable” condition, which means that we only consider story as done, when it can be deployed automatically via scripts. Of course there are many other conditions in DoD (writing database upgrade scripts among them), but that itself is a good topic for an article.

Formulated in that way, DoD also has an impact on sprint planning and estimation. It is used as a checklist to validate whether all major tasks are accounted for. Regarding the database, each team member needs to know project-adopted rules on writing upgrade scripts: What is the format? Are we using any template? Where do we store files? Do the files need to follow any naming convention? Etc.

During development, changes to code and database are done by developers in parallel. On top of modifications to the database project, team members write upgrade scripts, which are checked in together with the rest of the code and used to test User Stories (US) on a dedicated environment.

On sprint finish, if the decision has been made that the software goes to production, the scripts together with other artefacts are used to make the installation.

## 3. Versioning Approaches

The database versioning implementation details vary from project to project, but key elements are always present. Here they are:

• Database is under version control – an obvious starting point. If we could not identify database changes, how could we write upgrade scripts for them? We successfully used Visual Studio 2010 database projects or RedGate SQL Source Control to manage the structure of the database, both against TFS repository. This area is widely supported by the tools.
• The database version is stored in the database itself – to be able to identify what version of DB schema is installed in a given environment, the database itself needs to be marked. There are many ways to achieve that (User-defined function, extended properties, specifically named object, etc.), but at Objectivity we constantly use a dedicated table, usually called DbVersion. The advantage of this option is that tables are standard database objects, widely known and used by developers / administrators and easily accessible from the code. Databases may store either the current version or the entire version history. Sample table definition can be found on Listing 1.
• The database version is verified by applications on start-up – an application has database version checks embedded in code and verifies it on initialization. If a certain condition is not met, the application displays an appropriate error message and stops. This best practice frees the development team from major deployment errors and minimizes the risk of wasting testing efforts.
• Upgrade scripts are written in parallel to development – SQL scripts altering the database are prepared when the database schema is being modified by developers. We use a template for those scripts (see sample on Listing 2). The header lines of the template check for the expected database version and a transaction is started if the version is correct. After custom (i.e. written by developer) database amendments are executed, the template updates the database version table, commits the transaction and displays a success message. This practice addresses second and fourth drawbacks from the list mentioned in this article's introduction.

## 4. A Hybrid Solution

Sometimes, when a database is large in terms of objects (not data) upgrade scripts may grow big, especially if we use SPs or UDFs. As a countermeasure it is possible to limit upgrade scripts to only certain object types, usually those who actually store the data (i.e. tables) and always reinstall the rest of the objects at the end of upgrade process. Such hybrid solution is highly recommended when a team is new to the database upgrade process and extensive business logic is kept in the database.

## 5. What about the Data?

Data can be generally divided into two groups:

• Initial data, required to run and/or start up an application, such as reference data, dictionaries, etc.
• Business data, created from application’s UI, imported from external sources or pre-generated samples so that developers/testers have something to work on.

It is recommended to keep these groups separated from the beginning of a project in order to avoid another “last mile” problem.

We initialize the database with data either through scripts / CSVs kept in separate folders for each group or we embed initial data in upgrade scripts (for ease of deployment in smaller systems). When keeping the data in a separate folder, it is the best to write scripts in such a way so that they can be run multiple times. Another problem you have to deal with is the table insertion order. In complex database schemas (e.g. with circular dependencies), it is impossible to order the tables, therefore the best practice is to disable foreign keys before insertions and re-enable once the data is in.

## 6. Best Versioning Practices

The following practices are not required, but I have found them really useful, and you should at least consider using them in your new projects.

### Use a three-elements version string

We found that following format for the database version string is most flexible:

<major version>.<minor version>.<revision>

The first element changes on major releases / phase of a system, e.g. every couple of months. The two subsequent elements are controlled by developers. Minor version changes when breaking amendments (e.g. new mandatory field) are introduced to the database, making the “old” application incompatible with the new database schema. Revision is incremented when non-breaking changes are applied to the database (e.g. new index, new table, new optional field).

### Write environment-independent scripts

In theory, the upgrade scripts should be written in such a way that they can be run without modifications on any environment. This means that they should not contain any paths, names of database instances, SQL user names or linked server configurations. In the Microsoft SQL Server world, SQLCMD variables can be used to achieve this. More information can be found here

### If multiple teams are working on a database, split the whole database into several schemas

The development of large databases by multiple parallel teams becomes more efficient when the database has been split into a number of schemas. Every schema has its own version and upgrade scripts. This minimizes and simplifies conflicts during code merges. Of course, the DbVersion table has to be modified to allow storing the schema version (new column). We can distinguish two types of schemas: shared and dedicated. When a team plans to change the shared schema, other teams need to be consulted to ensure that the structure of shared objects is modified in the best way. A dedicated schema is in team’s full control.

Alternatively if the database is a legacy one and/or we cannot introduce schemas, we can split database objects into several virtual areas and version each one of them separately.

### Never change upgrade scripts once they have been checked in

When database shows that it has a certain version, you use that. You (as a developer) do not always compare it against the original. So it is hard to detect situations when different versions of the upgrade script have been applied to a database instance. If you have made a database modification by mistake in your upgrade scripts, write another one reverting the change – do not modify the original script, because it may have been already applied to some environment.

### Reserve version ranges when working on multiple releases to simplify merges

When multiple teams are working in parallel on successive releases of the same system / database, it is the best to agree upfront on a versioning range used by every team to avoid any merging problems.

For example: Team A is working on release 1 could use the 2.x.x range for Shared schema and 1.x.x for Ordering schema, while Team B working on release 2 would use 3.x.x range for Shared schema and 1.x.x for Reporting schema.

The drawback of writing upgrade scripts during development is that there is a lot of them. Consequently, automation is much desired as it saves a lot of time for developers, release managers, etc. Moreover it speeds up the delivery process and makes it more resilient. Furthermore, with automation the upgrade process could be easily plugged into the CI process.

At Objectivity we use the PSake module (PowerShell) to automate the process. PowerShell is Microsoft’s task automation framework consisting of a scripting language built on top of the .NET Framework. On the other hand, PSake is a domain-specific language written in PowerShell to create builds using a dependency pattern similar to Rake or MSBuild. A PSake build script consists of Tasks. Each Task is a function that can define dependencies on other Task functions. Our upgrade script is written as a single PSake task.

Here is our database upgrade algorithm:

1. Check the existing version of the database
2. Search for the upgrade script corresponding to the current version (this step relies on file naming convention being aligned with db version)
3. If the file has been found, then execute it and verify the output; quit on error
4. If no scripts have been found, then quit

A sample implementation can be found in Listing 3.

We often found at Objectivity that developers new to the database upgrade process accidentally break project-adopted rules for writing upgrade scripts. So it is good to verify consistency of your upgrade scripts after every commit on Continuous Integration (CI) server by checking:

• file naming convention – we use following format for file name: <prefix>_<current_version_from_db_version_table>_<target_version>_<optional_info_about_upgrade>.sql
When multiple schemas are used, the prefix contains name of the schema.
• file content – header and footer of a script can be checked to ensure that the template has been used; in addition, versions from content can be validated against those in file name.

The validation may even happen before the actual code is built. Any breaches should fail the build.

I also strongly recommend verification if the upgrade scripts lead to the same structure of the database as the development database projects. We do it by setting up during CI two database instances:

• The first being a restored depersonalized copy from production, on which upgrade scripts have been applied
• The second being created from the database project(s) and comparing them. Any differences should fail the build.

### Backup the database before upgrade

Although upgrade scripts are written in a transactional way, there is no guarantee that they will pass, therefore it is the best to perform backups before upgrade, just in case. The step should be automated.

### Record a history of applied upgrade scripts

When database-related problems arise during testing, it is useful to have a history of changes applied to a given database. If your upgrade process is automated, it is not difficult to record all executed upgrade scripts in a dedicated history table for debugging needs. Listing 4 shows a sample DbHistory table definition.

## Listing 1 – DbVersion definition

 Column name Column type Version Nvarchar(50) Not null UpdatedBy Nvarchar(50) Not null UpdatedOn DateTime Not null Reason Nvarchar(1000) Not null

## Listing 2 – Upgrade script template

DECLARE @currentVersion [nvarchar](50)DECLARE @expectedVersion [nvarchar](50)
DECLARE @newVersion [nvarchar](50)DECLARE @author [nvarchar](50)DECLARE @textcomment [nvarchar](1000)

SET @expectedVersion = '10.0.217'SET @newVersion = '10.0.218'SET @author = 'klukasik'SET @textcomment = 'Sample description of database changes'  SELECT @currentVersion = (SELECT TOP 1 [Version] FROM DbVersion ORDER BY Id DESC)
IF @currentVersion = @expectedVersionBEGIN TRY      BEGIN TRAN-- ################################################ BEGIN OF SCRIPT ############################################################--#############################################################################################################################  -- custom database modifications  --################################################# END OF SCRIPT ############################################################--#############################################################################################################################      INSERT INTO DbVersion([Version],[UpdatedBy],[UpdatedOn],[Reason])            VALUES(@newVersion, @author, getdate(), @textcomment)      COMMIT TRAN      PRINT 'Database has been updated successfully to ' + @newVersionEND TRYBEGIN CATCH      IF @@TRANCOUNT > 0            BEGIN                  ROLLBACK TRANSACTION            END      SELECT        ERROR_NUMBER() AS ErrorNumber,        ERROR_SEVERITY() AS ErrorSeverity,        ERROR_STATE() AS ErrorState,        ERROR_PROCEDURE() AS ErrorProcedure,        ERROR_LINE() AS ErrorLine,        ERROR_MESSAGE() AS ErrorMessage;
DECLARE @ErrorMessage NVARCHAR(max),              @ErrorSeverity INT,              @ErrorState INT;
SET @ErrorMessage = ERROR_MESSAGE();    SET @ErrorSeverity = ERROR_SEVERITY();    SET @ErrorState = ERROR_STATE();
RAISERROR(@ErrorMessage,@ErrorSeverity,@ErrorState);    RETURN;END CATCH;ELSEBEGIN      PRINT 'Invalid database version - expecting: ' + @expectedVersion + 'currently: ' + @currentVersionEND

Task UpgradeDatabase -depends Initialize -description "Upgrades db with SQL scripts" {    $logFile = "$log_dir\DatabaseUpgrade.log"     if (Test-Path $logFile) { Remove-Item$logFile
}    $connectionString =$script:tokens["@@ConnectionString@@"]       $getVersionQuery = "SELECT TOP 1 Version FROM dbo.DbVersion ORDER BY [Id] DESC"$dbConnectionStringBuilder = New-Object System.Data.SqlClient.SqlConnectionStringBuilder    $dbConnectionStringBuilder.set_ConnectionString($connectionString)
$dbVersion = Get-DbVersion$dbConnectionStringBuilder $getVersionQuery Write-Output ("Initial db version is {0}" -f$dbVersion)
while ($true) {$files = Get-ChildItem ("$database_upgrade_scripts_dir\Upgrade_{0}_*.sql" - f$dbVersion)
if ($files -ne$null)             {            $upgraded =$true             foreach ($file in$files)                    {                Write-Output ("[$($dbConnectionStringBuilder.DataSource) / $($dbConnectionStringBuilder.InitialCatalog)] Upgrading with {0}..." -f $file.Name)$sqlMessage = Run-Sql $file$dbConnectionStringBuilder $true$nl = [Environment]::NewLine                Write-Output ("Executing $file.$nl$sqlMessage") | Out-File$logFile-append                if (! ($sqlMessage -like "*Database has been updated successfully to*")) { throw "Something went wrong. See$logFile"
}            }            $dbVersion = Get-DbVersion$dbConnectionStringBuilder $getVersionQuery if ($upgraded)                    {                Write-Output ("Db version is {0}" -f $dbVersion) } } else { break } }}function Run-Sql($inputFile, $dbConnectionStringBuilder, [bool]$isFile) {    $database =$dbConnectionStringBuilder.InitialCatalog
$ps = [PowerShell]::Create()$e = New-Object System.Management.Automation.Runspaces.PSSnapInException | Out-Null    $ps.Runspace.RunspaceConfiguration.AddPSSnapIn( "SqlServerCmdletSnapin100", [ref]$e ) | Out-Null     $param =$ps.AddCommand("Invoke-Sqlcmd").AddParameter("database", $dbConnectionStringBuilder.InitialCatalog).AddParameter("serverinstance",$dbConnectionStringBuilder.DataSource).AddParameter("Verbose").AddParameter("QueryTimeout", 120)     if ($isFile) {$param = $ps.AddParameter("InputFile",$inputFile)    } else {        $param =$ps.AddParameter("Query", $inputFile) } if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq$false) {      $param =$param.AddParameter("username", $dbConnectionStringBuilder."User ID").AddParameter("password",$dbConnectionStringBuilder.Password)    }
try {       $ps.Invoke() | Out-Null } catch { Write-Output$ps.Streams       throw    }
$sqlMessage = ""$nl = [Environment]::NewLine    foreach ($verbose in$ps.Streams.Verbose) {        $sqlMessage +=$verbose.ToString() + $nl } foreach ($error in $ps.Streams.Error) {$sqlMessage += $error.ToString() +$nl    }
return $sqlMessage} function Invoke-SqlCmdSnapin ($dbConnectionStringBuilder, $query) { if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq$false) {             Invoke-SqlCmd -query $query  -database$dbConnectionStringBuilder.InitialCatalog                    -serverinstance $dbConnectionStringBuilder.DataSource  -username$dbConnectionStringBuilder."User ID"                    -password $dbConnectionStringBuilder.Password } else { Invoke-SqlCmd -query$query                    -database $dbConnectionStringBuilder.InitialCatalog  -serverinstance$dbConnectionStringBuilder.DataSource      }}

### Listing 4 – DbHistory definition

 Column name Column type Filename Nvarchar(50) Not null Content Nvarchar(max) Not null RunOn DateTime Not null

## 7. Final Thoughts

Konrad Lukasik is an enthusiast of Microsoft technologies in general and .NET in particular. A professional with almost 10 years of commercial experience. Currently works as a Technical Architect at Objectivity and helps teams deliver high-quality software. He continuously works on making things “as simple as possible, but not simpler”.

Style

## Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

• ##### Or you can use liquibase...

by Youri Bonnaffé,

• ##### Re: Or you can use liquibase...

by Øystein Gisnås,

• ##### Why isn't this stuff just built into DB's

by Nick Portelli,

• ##### Change impact analysis

by peter lin,

• ##### Use Nextep

by Charly CLAIRMONT,

• ##### Lots of options

by Rafael Ponte,

by Conor MacMahon,

• ##### RedGate Automation Pack for CI and Deployment

by Steven Koh,

• ##### Re: RedGate Automation Pack for CI and Deployment

by Lea Richards,

• ##### Flyway is much simpler but more or less equally powerful

by Hans-Peter Störr,

• ##### Thanking you for sharing EXPERIENCE

by Franck MIKULECZ,

• ##### Or you can use liquibase...

Your message is awaiting moderation. Thank you for participating in the discussion.

Or you can use liquibase...

• ##### Re: Or you can use liquibase...

Your message is awaiting moderation. Thank you for participating in the discussion.

...or Flyway (flywaydb.org/)

• ##### Why isn't this stuff just built into DB's

Your message is awaiting moderation. Thank you for participating in the discussion.

That never made sense to me.

• ##### Change impact analysis

by peter lin,

Your message is awaiting moderation. Thank you for participating in the discussion.

I didn't come up with the idea and it's quite old, but what you need is to version your model and use impact analysis. Change impact analysis tells you exactly which parts of the database are affected. IBM's Ascential group had impact analysis as part of their old Metadata server. From the impact analysis, it could automate the update to the database to make life easier. Not all changes were supported, but most types of simple changes like adding columns, splitting a table, adding table, renaming a table and renaming a column were handled out of the box.
Even today, most database modeling products don't provide this. The system would make a new database and move the data over with the appropriate changes applied. Once it was done, you'd reconfigure the application to point to the new one.

• ##### Use Nextep

Your message is awaiting moderation. Thank you for participating in the discussion.

Hi!

Based on agile database concept Nextep allows to build deliver database as code ! Try it : www.nextep-softwares.com/

Cheers

• ##### Lots of options

by Rafael Ponte,

Your message is awaiting moderation. Thank you for participating in the discussion.

Great article, congrats!

Nowadays we have a lot of options of migration tools, mainly in the Java world. Tools like Flyway, MyBatis Migrations, Liquidbase and others. All of them are platform independent.

I'd like to know why you guys decided to implement your own tool instead of adopting an opensource and mature one.

Your message is awaiting moderation. Thank you for participating in the discussion.

Thanks for comments. I understand that there are many tools available to support the process, but they may not always be the best option.
For sure with a new tool there is an additional cost of learning it. Often it is better first to try it in small project before using it in large endeavor for your key customer.
Sometimes you might have other more important issues in a project that it would be unwise to turn upside down your DB approach.
Some tools, although platform independent, require Java to run, which may be not acceptable for your Microsoft-oriented customer, assuming you use his environment for development.
Or you might be already few sprint in project, when you find out that the DB schema changes are not handled properly, so it might be too late to completely redesign the approach.
It is a pity that DB engines do not support this aspect of SDLC. Microsoft world is definitely behind Java and still there is a place for a good tool nicely integrated with Visual Studio and TFS.
Finally I hope that most of described practices are more universal and that they can be of use no matter what tool you are using.

Your message is awaiting moderation. Thank you for participating in the discussion.

Good article, I agree it is a good description on how to write a db migration tool, that includes versioning.
However, as others have pointed out, there are numerous tools in the market place that do this for you for free (i.e. I have used liquibase to great effect, but there are others).
I take your point that some clients that are 'Microsoft houses' may be a little biased against the JVM. However, it's just infrastructure, to be called, not using the JDK, thus the average client would treat that as less risky.

I suppose it comes down to do you want your software solution to include db migration logic or not? Rolling your own means higher cost to support, but greater freedom. However, it is such a typical solution to out-source to 3rd party tools like liquibase, I would think it's cheaper/faster to delegate out to them. That leaves you to concentrate on your actual application.

Each to their own Konrad, thanks.

Kind regards,
Conor

• ##### Flyway is much simpler but more or less equally powerful

Your message is awaiting moderation. Thank you for participating in the discussion.

We are happily using Flyway which does most of what is discussed in this article without you having to use script templates and whatnot. The only thing you have to do is to follow a naming convention, and flyway takes care of the execution order, maintaining a execution history in the database and ensuring scripts get executed exactly once. Flyway is a Java program, though (standalone, if needed).

• ##### RedGate Automation Pack for CI and Deployment

by Steven Koh,

Your message is awaiting moderation. Thank you for participating in the discussion.

Hi Lukasik,
I am looking at RedGate's DB products and since "RedGate SQL Source Control" is already used, why not use their CI products instead? www.red-gate.com/products/sql-development/sql-a...

From product evaluation's perspective, is there anything lacking in their product that you choose to roll your own?

• ##### Thanking you for sharing EXPERIENCE

Your message is awaiting moderation. Thank you for participating in the discussion.

Clearly a few commenters have not appreciated the most important but stealth part of your article: Experience.
I read between the lines about numerous hardships and learned remedies, something that no automation tool will replace.
Yes we do/plan to use Liquibase, but the concepts you describe will provide a great springboard to understand "why" things are done this way and help avoid many mistakes or shortcuts in tool usage.

I cannot thank you enough...

• ##### Re: RedGate Automation Pack for CI and Deployment

by Lea Richards,

Your message is awaiting moderation. Thank you for participating in the discussion.

For me personally, database versioning is not about saving the current state of the database into SVN/GIT. You need to know HOW you got to this state, have to have the ability to reproduce changes on other servers. Also, you have to be able to do a reasonable rollback.

RedGate's tools all work ex-post: they do not store the changes, just the current state of the database.

Imagine you rename column A to B. Tools working ex-post will go "DROP COLUMN A; CREATE COLUMN B;". All data from column A is gone once you try to apply this change to another server or try to do a rollback.

More advanced tools track individual changes when they happen. Then they can do "RENAME COLUMN A TO B;". See how Databazoo Dev Modeler does that (see www.devmodeler.com).

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.