InfoQ Homepage Articles Database Versioning and Delivery with Upgrade Scripts

Database Versioning and Delivery with Upgrade Scripts

This item in japanese

Aug 23, 2013 13 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

1. Introduction

Database upgrade is usually one of “last mile” delivery tasks, being frequently left till the end of the project or till the sprint before the release, which is far from perfect because:

During each software deployment on a test environment the database is often recreated, which means that every time testers lose their test data
If the project lasts long enough, upgrade scripts are written sometimes months later than the initial database change was made, when the knowledge of how to migrate the data might be lost or impaired
There is a significant risk of failure as the upgrade scripts have not been tested thoroughly and repeatedly prior to production release
It is hard to estimate how long it will take to write the upgrade scripts ,adding even more risk to the delivery date and budget

In order to avoid such problems, experience taught me that good database versioning and a delivery strategy is fundamental for most enterprise projects. Here is how we deal with it at Objectivity.

2. The Agile Perspective

Our projects are led in an agile manner meaning applications are developed incrementally and iteratively, and the database is included in this software development process. It all starts with “Definition of Done” (DoD), which is crucial to every highly functioning team. User Story-level DoD should contain a “Deployable” condition, which means that we only consider story as done, when it can be deployed automatically via scripts. Of course there are many other conditions in DoD (writing database upgrade scripts among them), but that itself is a good topic for an article.

Formulated in that way, DoD also has an impact on sprint planning and estimation. It is used as a checklist to validate whether all major tasks are accounted for. Regarding the database, each team member needs to know project-adopted rules on writing upgrade scripts: What is the format? Are we using any template? Where do we store files? Do the files need to follow any naming convention? Etc.

During development, changes to code and database are done by developers in parallel. On top of modifications to the database project, team members write upgrade scripts, which are checked in together with the rest of the code and used to test User Stories (US) on a dedicated environment.

On sprint finish, if the decision has been made that the software goes to production, the scripts together with other artefacts are used to make the installation.

3. Versioning Approaches

The database versioning implementation details vary from project to project, but key elements are always present. Here they are:

Database is under version control – an obvious starting point. If we could not identify database changes, how could we write upgrade scripts for them? We successfully used Visual Studio 2010 database projects or RedGate SQL Source Control to manage the structure of the database, both against TFS repository. This area is widely supported by the tools.
The database version is stored in the database itself – to be able to identify what version of DB schema is installed in a given environment, the database itself needs to be marked. There are many ways to achieve that (User-defined function, extended properties, specifically named object, etc.), but at Objectivity we constantly use a dedicated table, usually called DbVersion. The advantage of this option is that tables are standard database objects, widely known and used by developers / administrators and easily accessible from the code. Databases may store either the current version or the entire version history. Sample table definition can be found on Listing 1.
The database version is verified by applications on start-up – an application has database version checks embedded in code and verifies it on initialization. If a certain condition is not met, the application displays an appropriate error message and stops. This best practice frees the development team from major deployment errors and minimizes the risk of wasting testing efforts.
Upgrade scripts are written in parallel to development – SQL scripts altering the database are prepared when the database schema is being modified by developers. We use a template for those scripts (see sample on Listing 2). The header lines of the template check for the expected database version and a transaction is started if the version is correct. After custom (i.e. written by developer) database amendments are executed, the template updates the database version table, commits the transaction and displays a success message. This practice addresses second and fourth drawbacks from the list mentioned in this article's introduction.

4. A Hybrid Solution

Sometimes, when a database is large in terms of objects (not data) upgrade scripts may grow big, especially if we use SPs or UDFs. As a countermeasure it is possible to limit upgrade scripts to only certain object types, usually those who actually store the data (i.e. tables) and always reinstall the rest of the objects at the end of upgrade process. Such hybrid solution is highly recommended when a team is new to the database upgrade process and extensive business logic is kept in the database.

5. What about the Data?

Data can be generally divided into two groups:

Initial data, required to run and/or start up an application, such as reference data, dictionaries, etc.
Business data, created from application’s UI, imported from external sources or pre-generated samples so that developers/testers have something to work on.

It is recommended to keep these groups separated from the beginning of a project in order to avoid another “last mile” problem.

We initialize the database with data either through scripts / CSVs kept in separate folders for each group or we embed initial data in upgrade scripts (for ease of deployment in smaller systems). When keeping the data in a separate folder, it is the best to write scripts in such a way so that they can be run multiple times. Another problem you have to deal with is the table insertion order. In complex database schemas (e.g. with circular dependencies), it is impossible to order the tables, therefore the best practice is to disable foreign keys before insertions and re-enable once the data is in.

6. Best Versioning Practices

The following practices are not required, but I have found them really useful, and you should at least consider using them in your new projects.

Use a three-elements version string

We found that following format for the database version string is most flexible:

The first element changes on major releases / phase of a system, e.g. every couple of months. The two subsequent elements are controlled by developers. Minor version changes when breaking amendments (e.g. new mandatory field) are introduced to the database, making the “old” application incompatible with the new database schema. Revision is incremented when non-breaking changes are applied to the database (e.g. new index, new table, new optional field).

Write environment-independent scripts

In theory, the upgrade scripts should be written in such a way that they can be run without modifications on any environment. This means that they should not contain any paths, names of database instances, SQL user names or linked server configurations. In the Microsoft SQL Server world, SQLCMD variables can be used to achieve this. More information can be found here

If multiple teams are working on a database, split the whole database into several schemas

The development of large databases by multiple parallel teams becomes more efficient when the database has been split into a number of schemas. Every schema has its own version and upgrade scripts. This minimizes and simplifies conflicts during code merges. Of course, the DbVersion table has to be modified to allow storing the schema version (new column). We can distinguish two types of schemas: shared and dedicated. When a team plans to change the shared schema, other teams need to be consulted to ensure that the structure of shared objects is modified in the best way. A dedicated schema is in team’s full control.

Alternatively if the database is a legacy one and/or we cannot introduce schemas, we can split database objects into several virtual areas and version each one of them separately.

Never change upgrade scripts once they have been checked in

When database shows that it has a certain version, you use that. You (as a developer) do not always compare it against the original. So it is hard to detect situations when different versions of the upgrade script have been applied to a database instance. If you have made a database modification by mistake in your upgrade scripts, write another one reverting the change – do not modify the original script, because it may have been already applied to some environment.

Reserve version ranges when working on multiple releases to simplify merges

When multiple teams are working in parallel on successive releases of the same system / database, it is the best to agree upfront on a versioning range used by every team to avoid any merging problems.

For example: Team A is working on release 1 could use the 2.x.x range for Shared schema and 1.x.x for Ordering schema, while Team B working on release 2 would use 3.x.x range for Shared schema and 1.x.x for Reporting schema.

Automate the upgrade process

The drawback of writing upgrade scripts during development is that there is a lot of them. Consequently, automation is much desired as it saves a lot of time for developers, release managers, etc. Moreover it speeds up the delivery process and makes it more resilient. Furthermore, with automation the upgrade process could be easily plugged into the CI process.

At Objectivity we use the PSake module (PowerShell) to automate the process. PowerShell is Microsoft’s task automation framework consisting of a scripting language built on top of the .NET Framework. On the other hand, PSake is a domain-specific language written in PowerShell to create builds using a dependency pattern similar to Rake or MSBuild. A PSake build script consists of Tasks. Each Task is a function that can define dependencies on other Task functions. Our upgrade script is written as a single PSake task.

Here is our database upgrade algorithm:

Check the existing version of the database
Search for the upgrade script corresponding to the current version (this step relies on file naming convention being aligned with db version)
If the file has been found, then execute it and verify the output; quit on error
If no scripts have been found, then quit
Return to step 1

A sample implementation can be found in Listing 3.

Validate upgrade scripts on your CI

We often found at Objectivity that developers new to the database upgrade process accidentally break project-adopted rules for writing upgrade scripts. So it is good to verify consistency of your upgrade scripts after every commit on Continuous Integration (CI) server by checking:

file naming convention – we use following format for file name: <prefix>_<current_version_from_db_version_table>_<target_version>_<optional_info_about_upgrade>.sql
for example: Upgrade_1.0.1_1.0.2_rename_column.sql
When multiple schemas are used, the prefix contains name of the schema.
file content – header and footer of a script can be checked to ensure that the template has been used; in addition, versions from content can be validated against those in file name.

The validation may even happen before the actual code is built. Any breaches should fail the build.

I also strongly recommend verification if the upgrade scripts lead to the same structure of the database as the development database projects. We do it by setting up during CI two database instances:

The first being a restored depersonalized copy from production, on which upgrade scripts have been applied
The second being created from the database project(s) and comparing them. Any differences should fail the build.

Backup the database before upgrade

Although upgrade scripts are written in a transactional way, there is no guarantee that they will pass, therefore it is the best to perform backups before upgrade, just in case. The step should be automated.

Record a history of applied upgrade scripts

When database-related problems arise during testing, it is useful to have a history of changes applied to a given database. If your upgrade process is automated, it is not difficult to record all executed upgrade scripts in a dedicated history table for debugging needs. Listing 4 shows a sample DbHistory table definition.

Listing 1 – DbVersion definition

Column name	Column type
Version	Nvarchar(50)	Not null
UpdatedBy	Nvarchar(50)	Not null
UpdatedOn	DateTime	Not null
Reason	Nvarchar(1000)	Not null

Listing 2 – Upgrade script template

DECLARE @currentVersion [nvarchar](50)
DECLARE @expectedVersion [nvarchar](50)
DECLARE @newVersion [nvarchar](50)
DECLARE @author [nvarchar](50)
DECLARE @textcomment [nvarchar](1000)

SET @expectedVersion = '10.0.217'
SET @newVersion = '10.0.218'
SET @author = 'klukasik'
SET @textcomment = 'Sample description of database changes'  

SELECT @currentVersion = (SELECT TOP 1 [Version] FROM DbVersion ORDER BY Id DESC)

IF @currentVersion = @expectedVersion
BEGIN TRY
      BEGIN TRAN
-- ################################################ BEGIN OF SCRIPT ############################################################
--
#############################################################################################################################  

-- custom database modifications  

--################################################# END OF SCRIPT ############################################################
--
#############################################################################################################################
      INSERT INTO DbVersion([Version],[UpdatedBy],[UpdatedOn],[Reason])
            VALUES(@newVersion, @author, getdate(), @textcomment)
      COMMIT TRAN
      PRINT 'Database has been updated successfully to ' + @newVersion
END TRY
BEGIN CATCH
      IF @@TRANCOUNT > 0
            BEGIN
                  ROLLBACK TRANSACTION
            END
      SELECT
        ERROR_NUMBER() AS ErrorNumber,
        ERROR_SEVERITY() AS ErrorSeverity,
        ERROR_STATE() AS ErrorState,
        ERROR_PROCEDURE() AS ErrorProcedure,
        ERROR_LINE() AS ErrorLine,
        ERROR_MESSAGE() AS ErrorMessage;

    DECLARE @ErrorMessage NVARCHAR(max),
              @ErrorSeverity INT,
              @ErrorState INT;

    SET @ErrorMessage = ERROR_MESSAGE();
    SET @ErrorSeverity = ERROR_SEVERITY();
    SET @ErrorState = ERROR_STATE();

    RAISERROR(@ErrorMessage,@ErrorSeverity,@ErrorState);
    RETURN;
END CATCH;
ELSE
BEGIN
      PRINT 'Invalid database version - expecting: ' + @expectedVersion + 'currently: ' + @currentVersion
END

Listing 3 – PSake UpgradeDatabase task and PowerShell helpers

Task UpgradeDatabase -depends Initialize -description "Upgrades db with SQL scripts" {
    $logFile = "$log_dir\DatabaseUpgrade.log" 
    if (Test-Path $logFile)
    {
       Remove-Item $logFile
    }
    $connectionString = $script:tokens["@@ConnectionString@@"]
       $getVersionQuery = "SELECT TOP 1 Version FROM dbo.DbVersion ORDER BY [Id] DESC" 
    $dbConnectionStringBuilder = New-Object System.Data.SqlClient.SqlConnectionStringBuilder
    $dbConnectionStringBuilder.set_ConnectionString($connectionString)

    $dbVersion = Get-DbVersion $dbConnectionStringBuilder $getVersionQuery
    Write-Output ("Initial db version is {0}" -f $dbVersion)

    while ($true)
       {
        $files = Get-ChildItem ("$database_upgrade_scripts_dir\Upgrade_{0}_*.sql" - f $dbVersion)

        if ($files -ne $null)
             {
            $upgraded = $true 
            foreach ($file in $files)
                    {
                Write-Output ("[$($dbConnectionStringBuilder.DataSource) / $($dbConnectionStringBuilder.InitialCatalog)] Upgrading with {0}..." -f $file.Name)
                $sqlMessage = Run-Sql $file $dbConnectionStringBuilder $true 
                $nl = [Environment]::NewLine
                Write-Output ("Executing $file.$nl$sqlMessage") | Out-File $logFile-append
                if (! ($sqlMessage -like "*Database has been updated successfully to*"))
                          {
                    throw "Something went wrong. See $logFile" 
                }
            }
            $dbVersion = Get-DbVersion $dbConnectionStringBuilder $getVersionQuery
            if ($upgraded)
                    {
                Write-Output ("Db version is {0}" -f $dbVersion)
            }
         }
              else
              {
            break
         }
    }
}
function Run-Sql($inputFile, $dbConnectionStringBuilder, [bool]$isFile) {
    $database = $dbConnectionStringBuilder.InitialCatalog

    $ps = [PowerShell]::Create()

    $e = New-Object System.Management.Automation.Runspaces.PSSnapInException | Out-Null
    $ps.Runspace.RunspaceConfiguration.AddPSSnapIn( "SqlServerCmdletSnapin100", [ref]$e ) | Out-Null 
    $param = $ps.AddCommand("Invoke-Sqlcmd").AddParameter("database", $dbConnectionStringBuilder.InitialCatalog).AddParameter("serverinstance", $dbConnectionStringBuilder.DataSource).AddParameter("Verbose").AddParameter("QueryTimeout", 120)
     if ($isFile) {
        $param = $ps.AddParameter("InputFile", $inputFile)
    } else {
        $param = $ps.AddParameter("Query", $inputFile)
    }
    if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq $false) {
      $param = $param.AddParameter("username", $dbConnectionStringBuilder."User ID").AddParameter("password", $dbConnectionStringBuilder.Password)
    }

    try {
       $ps.Invoke() | Out-Null
    } catch {
       Write-Output $ps.Streams
       throw
    }

    $sqlMessage = ""
    $nl = [Environment]::NewLine
    foreach ($verbose in $ps.Streams.Verbose) {
        $sqlMessage += $verbose.ToString() + $nl
    }
    foreach ($error in $ps.Streams.Error) {
        $sqlMessage += $error.ToString() + $nl
    }

    return $sqlMessage
}

function Invoke-SqlCmdSnapin ($dbConnectionStringBuilder, $query) {
   
      if (!$dbConnectionStringBuilder.ContainsKey("Integrated Security") -or[System.Convert]::ToBoolean($dbConnectionStringBuilder."Integrated Security") -eq $false) {
             Invoke-SqlCmd -query $query `
                   -database $dbConnectionStringBuilder.InitialCatalog `
                   -serverinstance $dbConnectionStringBuilder.DataSource `
                   -username $dbConnectionStringBuilder."User ID" `
                   -password $dbConnectionStringBuilder.Password
      } else {
             Invoke-SqlCmd -query $query `
                   -database $dbConnectionStringBuilder.InitialCatalog `
                   -serverinstance $dbConnectionStringBuilder.DataSource
      }
}

Listing 4 – DbHistory definition

Column name	Column type
Filename	Nvarchar(50)	Not null
Content	Nvarchar(max)	Not null
RunOn	DateTime	Not null

7. Final Thoughts

Database versioning and delivery strategy is key for most enterprise projects. Using this article as your map, you can review and improve your existing solution and practices or build a new one from a scratch. Perhaps not all rules apply to your case, but at least they will help you to objectively asses your database upgrade tactic. If you need any further clarifications or want to give a feedback about the advices I have provided or other tips you consider important, please send me your questions or comments, and I will try to respond quickly. You can reach me via e-mail at klukasik@objectivity.co.uk

About the Author

Konrad Lukasik is an enthusiast of Microsoft technologies in general and .NET in particular. A professional with almost 10 years of commercial experience. Currently works as a Technical Architect at Objectivity and helps teams deliver high-quality software. He continuously works on making things “as simple as possible, but not simpler”.

InfoQ Software Architects' Newsletter