BT

Database Versioning and Delivery with Upgrade Scripts

Posted by Konrad Lukasik on Aug 23, 2013 |

1. Introduction

Database upgrade is usually one of “last mile” delivery tasks, being frequently left till the end of the project or till the sprint before the release, which is far from perfect because:

  • During each software deployment on a test environment the database is often recreated, which means that every time testers lose their test data
  • If the project lasts long enough, upgrade scripts are written sometimes months later than the initial database change was made, when the knowledge of how to migrate the data might be lost or impaired
  • There is a significant risk of failure as the upgrade scripts have not been tested thoroughly and repeatedly prior to production release
  • It is hard to estimate how long it will take to write the upgrade scripts ,adding even more risk to the delivery date and budget

In order to avoid such problems, experience taught me that good database versioning and a delivery strategy is fundamental for most enterprise projects. Here is how we deal with it at Objectivity.

2. The Agile Perspective

Our projects are led in an agile manner meaning applications are developed incrementally and iteratively, and the database is included in this software development process. It all starts with “Definition of Done” (DoD), which is crucial to every highly functioning team. User Story-level DoD should contain a “Deployable” condition, which means that we only consider story as done, when it can be deployed automatically via scripts. Of course there are many other conditions in DoD (writing database upgrade scripts among them), but that itself is a good topic for an article.

Formulated in that way, DoD also has an impact on sprint planning and estimation. It is used as a checklist to validate whether all major tasks are accounted for. Regarding the database, each team member needs to know project-adopted rules on writing upgrade scripts: What is the format? Are we using any template? Where do we store files? Do the files need to follow any naming convention? Etc.

During development, changes to code and database are done by developers in parallel. On top of modifications to the database project, team members write upgrade scripts, which are checked in together with the rest of the code and used to test User Stories (US) on a dedicated environment.

On sprint finish, if the decision has been made that the software goes to production, the scripts together with other artefacts are used to make the installation.

3. Versioning Approaches

The database versioning implementation details vary from project to project, but key elements are always present. Here they are:

  • Database is under version control – an obvious starting point. If we could not identify database changes, how could we write upgrade scripts for them? We successfully used Visual Studio 2010 database projects or RedGate SQL Source Control to manage the structure of the database, both against TFS repository. This area is widely supported by the tools.
  • The database version is stored in the database itself – to be able to identify what version of DB schema is installed in a given environment, the database itself needs to be marked. There are many ways to achieve that (User-defined function, extended properties, specifically named object, etc.), but at Objectivity we constantly use a dedicated table, usually called DbVersion. The advantage of this option is that tables are standard database objects, widely known and used by developers / administrators and easily accessible from the code. Databases may store either the current version or the entire version history. Sample table definition can be found on Listing 1.
  • The database version is verified by applications on start-up – an application has database version checks embedded in code and verifies it on initialization. If a certain condition is not met, the application displays an appropriate error message and stops. This best practice frees the development team from major deployment errors and minimizes the risk of wasting testing efforts.
  • Upgrade scripts are written in parallel to development – SQL scripts altering the database are prepared when the database schema is being modified by developers. We use a template for those scripts (see sample on Listing 2). The header lines of the template check for the expected database version and a transaction is started if the version is correct. After custom (i.e. written by developer) database amendments are executed, the template updates the database version table, commits the transaction and displays a success message. This practice addresses second and fourth drawbacks from the list mentioned in this article's introduction.

4. A Hybrid Solution

Sometimes, when a database is large in terms of objects (not data) upgrade scripts may grow big, especially if we use SPs or UDFs. As a countermeasure it is possible to limit upgrade scripts to only certain object types, usually those who actually store the data (i.e. tables) and always reinstall the rest of the objects at the end of upgrade process. Such hybrid solution is highly recommended when a team is new to the database upgrade process and extensive business logic is kept in the database.

5. What about the Data?

Data can be generally divided into two groups:

  • Initial data, required to run and/or start up an application, such as reference data, dictionaries, etc.
  • Business data, created from application’s UI, imported from external sources or pre-generated samples so that developers/testers have something to work on.

It is recommended to keep these groups separated from the beginning of a project in order to avoid another “last mile” problem.

We initialize the database with data either through scripts / CSVs kept in separate folders for each group or we embed initial data in upgrade scripts (for ease of deployment in smaller systems). When keeping the data in a separate folder, it is the best to write scripts in such a way so that they can be run multiple times. Another problem you have to deal with is the table insertion order. In complex database schemas (e.g. with circular dependencies), it is impossible to order the tables, therefore the best practice is to disable foreign keys before insertions and re-enable once the data is in.

6. Best Versioning Practices

The following practices are not required, but I have found them really useful, and you should at least consider using them in your new projects.

Use a three-elements version string

We found that following format for the database version string is most flexible:

<major version>.<minor version>.<revision>

The first element changes on major releases / phase of a system, e.g. every couple of months. The two subsequent elements are controlled by developers. Minor version changes when breaking amendments (e.g. new mandatory field) are introduced to the database, making the “old” application incompatible with the new database schema. Revision is incremented when non-breaking changes are applied to the database (e.g. new index, new table, new optional field).

Write environment-independent scripts

In theory, the upgrade scripts should be written in such a way that they can be run without modifications on any environment. This means that they should not contain any paths, names of database instances, SQL user names or linked server configurations. In the Microsoft SQL Server world, SQLCMD variables can be used to achieve this. More information can be found here 

If multiple teams are working on a database, split the whole database into several schemas

The development of large databases by multiple parallel teams becomes more efficient when the database has been split into a number of schemas. Every schema has its own version and upgrade scripts. This minimizes and simplifies conflicts during code merges. Of course, the DbVersion table has to be modified to allow storing the schema version (new column). We can distinguish two types of schemas: shared and dedicated. When a team plans to change the shared schema, other teams need to be consulted to ensure that the structure of shared objects is modified in the best way. A dedicated schema is in team’s full control.

Alternatively if the database is a legacy one and/or we cannot introduce schemas, we can split database objects into several virtual areas and version each one of them separately.

Never change upgrade scripts once they have been checked in

When database shows that it has a certain version, you use that. You (as a developer) do not always compare it against the original. So it is hard to detect situations when different versions of the upgrade script have been applied to a database instance. If you have made a database modification by mistake in your upgrade scripts, write another one reverting the change – do not modify the original script, because it may have been already applied to some environment.

Reserve version ranges when working on multiple releases to simplify merges

When multiple teams are working in parallel on successive releases of the same system / database, it is the best to agree upfront on a versioning range used by every team to avoid any merging problems.

For example: Team A is working on release 1 could use the 2.x.x range for Shared schema and 1.x.x for Ordering schema, while Team B working on release 2 would use 3.x.x range for Shared schema and 1.x.x for Reporting schema.

Automate the upgrade process

The drawback of writing upgrade scripts during development is that there is a lot of them. Consequently, automation is much desired as it saves a lot of time for developers, release managers, etc. Moreover it speeds up the delivery process and makes it more resilient. Furthermore, with automation the upgrade process could be easily plugged into the CI process.

At Objectivity we use the PSake module (PowerShell) to automate the process. PowerShell is Microsoft’s task automation framework consisting of a scripting language built on top of the .NET Framework. On the other hand, PSake is a domain-specific language written in PowerShell to create builds using a dependency pattern similar to Rake or MSBuild. A PSake build script consists of Tasks. Each Task is a function that can define dependencies on other Task functions. Our upgrade script is written as a single PSake task.

Here is our database upgrade algorithm:

  1. Check the existing version of the database
  2. Search for the upgrade script corresponding to the current version (this step relies on file naming convention being aligned with db version)
  3. If the file has been found, then execute it and verify the output; quit on error
  4. If no scripts have been found, then quit
  5. Return to step 1

A sample implementation can be found in Listing 3.

Validate upgrade scripts on your CI

We often found at Objectivity that developers new to the database upgrade process accidentally break project-adopted rules for writing upgrade scripts. So it is good to verify consistency of your upgrade scripts after every commit on Continuous Integration (CI) server by checking:

  • file naming convention – we use following format for file name: <prefix>_<current_version_from_db_version_table>_<target_version>_<optional_info_about_upgrade>.sql
    for example: Upgrade_1.0.1_1.0.2_rename_column.sql
    When multiple schemas are used, the prefix contains name of the schema.
  • file content – header and footer of a script can be checked to ensure that the template has been used; in addition, versions from content can be validated against those in file name.

The validation may even happen before the actual code is built. Any breaches should fail the build.

I also strongly recommend verification if the upgrade scripts lead to the same structure of the database as the development database projects. We do it by setting up during CI two database instances:

  • The first being a restored depersonalized copy from production, on which upgrade scripts have been applied
  • The second being created from the database project(s) and comparing them. Any differences should fail the build.

Backup the database before upgrade

Although upgrade scripts are written in a transactional way, there is no guarantee that they will pass, therefore it is the best to perform backups before upgrade, just in case. The step should be automated.

Record a history of applied upgrade scripts

When database-related problems arise during testing, it is useful to have a history of changes applied to a given database. If your upgrade process is automated, it is not difficult to record all executed upgrade scripts in a dedicated history table for debugging needs. Listing 4 shows a sample DbHistory table definition.

Listing 1 – DbVersion definition

Column name

Column type

 

Version

Nvarchar(50)

Not null

UpdatedBy

Nvarchar(50)

Not null

UpdatedOn

DateTime

Not null

Reason

Nvarchar(1000)

Not null

Listing 2 – Upgrade script template

DECLARE @currentVersion [nvarchar](50)
DECLARE @expectedVersion [nvarchar](50) DECLARE @newVersion [nvarchar](50)
DECLARE @author [nvarchar](50)
DECLARE @textcomment [nvarchar](1000) SET @expectedVersion = '10.0.217'
SET @newVersion = '10.0.218'
SET @author = 'klukasik'
SET @textcomment = 'Sample description of database changes'

SELECT @currentVersion = (SELECT TOP 1 [Version] FROM DbVersion ORDER BY Id DESC)
IF @currentVersion = @expectedVersion
BEGIN TRY
BEGIN TRAN
-- ################################################ BEGIN OF SCRIPT ############################################################
--
#############################################################################################################################

-- custom database modifications

--################################################# END OF SCRIPT ############################################################
--
#############################################################################################################################

INSERT INTO DbVersion([Version],[UpdatedBy],[UpdatedOn],[Reason])
VALUES(@newVersion, @author, getdate(), @textcomment)
COMMIT TRAN
PRINT 'Database has been updated successfully to ' + @newVersion
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
BEGIN
ROLLBACK TRANSACTION
END
SELECT
ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage;
DECLARE @ErrorMessage NVARCHAR(max),
@ErrorSeverity INT,
@ErrorState INT;
SET @ErrorMessage = ERROR_MESSAGE();
SET @ErrorSeverity = ERROR_SEVERITY();
SET @ErrorState = ERROR_STATE();
RAISERROR(@ErrorMessage,@ErrorSeverity,@ErrorState);
RETURN;
END CATCH;
ELSE
BEGIN
PRINT 'Invalid database version - expecting: ' + @expectedVersion + 'currently: ' + @currentVersion
END

Listing 3 – PSake UpgradeDatabase task and PowerShell helpers

Task UpgradeDatabase -depends Initialize -description "Upgrades db with SQL scripts" {
$logFile = "$log_dir\DatabaseUpgrade.log"
if (Test-Path $logFile)
{
Remove-Item $logFile }
$connectionString = $script:tokens["@@ConnectionString@@"]
$getVersionQuery = "SELECT TOP 1 Version FROM dbo.DbVersion ORDER BY [Id] DESC"
$dbConnectionStringBuilder = New-Object System.Data.SqlClient.SqlConnectionStringBuilder
$dbConnectionStringBuilder.set_ConnectionString($connectionString)
$dbVersion = Get-DbVersion $dbConnectionStringBuilder $getVersionQuery
Write-Output ("Initial db version is {0}" -f $dbVersion)
while ($true)
{
$files = Get-ChildItem ("$database_upgrade_scripts_dir\Upgrade_{0}_*.sql" - f $dbVersion)
if ($files -ne $null)
{
$upgraded = $true
foreach ($file in $files)
{
Write-Output ("[$($dbConnectionStringBuilder.DataSource) / $($dbConnectionStringBuilder.InitialCatalog)] Upgrading with {0}..." -f $file.Name)
$sqlMessage = Run-Sql $file $dbConnectionStringBuilder $true
$nl = [Environment]::NewLine
Write-Output ("Executing $file.$nl$sqlMessage") | Out-File $logFile-append
if (! ($sqlMessage -like "*Database has been updated successfully to*"))
{
throw "Something went wrong. See $logFile" }
}
$dbVersion = Get-DbVersion $dbConnectionStringBuilder $getVersionQuery
if ($upgraded)
{
Write-Output ("Db version is {0}" -f $dbVersion)
}
}
else
{ break
}
}
}
function Run-Sql($inputFile, $dbConnectionStringBuilder, [bool]$isFile) {
$database = $dbConnectionStringBuilder.InitialCatalog
$ps = [PowerShell]::Create()
$e = New-Object System.Management.Automation.Runspaces.PSSnapInException | Out-Null
$ps.Runspace.RunspaceConfiguration.AddPSSnapIn( "SqlServerCmdletSnapin100", [ref]$e ) | Out-Null
$param = $ps.AddCommand("Invoke-Sqlcmd").AddParameter("database", $dbConnectionStringBuilder.InitialCatalog).AddParameter("serverinstance", $dbConnectionStringBuilder.DataSource).AddParameter("Verbose").AddParameter("QueryTimeout", 120)
if ($isFile) {
$param = $ps.AddParameter("InputFile", $inputFile)
} else {
$param = $ps.AddParameter("Query", $inputFile)
}
if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq $false) {
$param = $param.AddParameter("username", $dbConnectionStringBuilder."User ID").AddParameter("password", $dbConnectionStringBuilder.Password)
}
try {
$ps.Invoke() | Out-Null
} catch {
Write-Output $ps.Streams
throw
}
$sqlMessage = ""
$nl = [Environment]::NewLine
foreach ($verbose in $ps.Streams.Verbose) {
$sqlMessage += $verbose.ToString() + $nl
}
foreach ($error in $ps.Streams.Error) {
$sqlMessage += $error.ToString() + $nl
}
return $sqlMessage
}
function Invoke-SqlCmdSnapin ($dbConnectionStringBuilder, $query) {

if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq $false) {
Invoke-SqlCmd -query $query `
-database $dbConnectionStringBuilder.InitialCatalog `
-serverinstance $dbConnectionStringBuilder.DataSource `
-username $dbConnectionStringBuilder."User ID" `
-password $dbConnectionStringBuilder.Password
} else {
Invoke-SqlCmd -query $query `
-database $dbConnectionStringBuilder.InitialCatalog `
-serverinstance $dbConnectionStringBuilder.DataSource
}
}

Listing 4 – DbHistory definition

Column name

Column type

 

Filename

Nvarchar(50)

Not null

Content

Nvarchar(max)

Not null

RunOn

DateTime

Not null

7. Final Thoughts

Database versioning and delivery strategy is key for most enterprise projects. Using this article as your map, you can review and improve your existing solution and practices or build a new one from a scratch. Perhaps not all rules apply to your case, but at least they will help you to objectively asses your database upgrade tactic. If you need any further clarifications or want to give a feedback about the advices I have provided or other tips you consider important, please send me your questions or comments, and I will try to respond quickly. You can reach me via e-mail at klukasik@objectivity.co.uk

About the Author

Konrad Lukasik is an enthusiast of Microsoft technologies in general and .NET in particular. A professional with almost 10 years of commercial experience. Currently works as a Technical Architect at Objectivity and helps teams deliver high-quality software. He continuously works on making things “as simple as possible, but not simpler”.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Or you can use liquibase... by Youri Bonnaffé

Or you can use liquibase...

Re: Or you can use liquibase... by Øystein Gisnås

...or Flyway (flywaydb.org/)

Why isn't this stuff just built into DB's by Nick Portelli

That never made sense to me.

Change impact analysis by peter lin

I didn't come up with the idea and it's quite old, but what you need is to version your model and use impact analysis. Change impact analysis tells you exactly which parts of the database are affected. IBM's Ascential group had impact analysis as part of their old Metadata server. From the impact analysis, it could automate the update to the database to make life easier. Not all changes were supported, but most types of simple changes like adding columns, splitting a table, adding table, renaming a table and renaming a column were handled out of the box.
Even today, most database modeling products don't provide this. The system would make a new database and move the data over with the appropriate changes applied. Once it was done, you'd reconfigure the application to point to the new one.

Use Nextep by Charly CLAIRMONT

Hi!

Based on agile database concept Nextep allows to build deliver database as code ! Try it : www.nextep-softwares.com/

Cheers

Lots of options by Rafael Ponte

Great article, congrats!

Nowadays we have a lot of options of migration tools, mainly in the Java world. Tools like Flyway, MyBatis Migrations, Liquidbase and others. All of them are platform independent.

I'd like to know why you guys decided to implement your own tool instead of adopting an opensource and mature one.

Few answers... by Konrad Lukasik

Thanks for comments. I understand that there are many tools available to support the process, but they may not always be the best option.
For sure with a new tool there is an additional cost of learning it. Often it is better first to try it in small project before using it in large endeavor for your key customer.
Sometimes you might have other more important issues in a project that it would be unwise to turn upside down your DB approach.
Some tools, although platform independent, require Java to run, which may be not acceptable for your Microsoft-oriented customer, assuming you use his environment for development.
Or you might be already few sprint in project, when you find out that the DB schema changes are not handled properly, so it might be too late to completely redesign the approach.
It is a pity that DB engines do not support this aspect of SDLC. Microsoft world is definitely behind Java and still there is a place for a good tool nicely integrated with Visual Studio and TFS.
Finally I hope that most of described practices are more universal and that they can be of use no matter what tool you are using.

Re: Few answers... by Conor MacMahon

Hi Konrad,

Good article, I agree it is a good description on how to write a db migration tool, that includes versioning.
However, as others have pointed out, there are numerous tools in the market place that do this for you for free (i.e. I have used liquibase to great effect, but there are others).
I take your point that some clients that are 'Microsoft houses' may be a little biased against the JVM. However, it's just infrastructure, to be called, not using the JDK, thus the average client would treat that as less risky.

I suppose it comes down to do you want your software solution to include db migration logic or not? Rolling your own means higher cost to support, but greater freedom. However, it is such a typical solution to out-source to 3rd party tools like liquibase, I would think it's cheaper/faster to delegate out to them. That leaves you to concentrate on your actual application.

Each to their own Konrad, thanks.

Kind regards,
Conor

Flyway is much simpler but more or less equally powerful by Hans-Peter Störr

Thanks for your interesting article!


We are happily using Flyway which does most of what is discussed in this article without you having to use script templates and whatnot. The only thing you have to do is to follow a naming convention, and flyway takes care of the execution order, maintaining a execution history in the database and ensuring scripts get executed exactly once. Flyway is a Java program, though (standalone, if needed).

RedGate Automation Pack for CI and Deployment by Steven Koh

Hi Lukasik,
I am looking at RedGate's DB products and since "RedGate SQL Source Control" is already used, why not use their CI products instead? www.red-gate.com/products/sql-development/sql-a...

From product evaluation's perspective, is there anything lacking in their product that you choose to roll your own?

Thanking you for sharing EXPERIENCE by Franck MIKULECZ

Clearly a few commenters have not appreciated the most important but stealth part of your article: Experience.
I read between the lines about numerous hardships and learned remedies, something that no automation tool will replace.
Yes we do/plan to use Liquibase, but the concepts you describe will provide a great springboard to understand "why" things are done this way and help avoid many mistakes or shortcuts in tool usage.

I cannot thank you enough...

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

11 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT