The Secrets of Database Change Deployment Automation
A fast moving world; Agile & DevOps
As business needs are the most significant driver of change, doing better with less and delivering it sooner is what differentiates leading and successful companies from the rest.
When a competitor delivers relevant features, faster and with better quality than you, you’re eventually going to lose market share. Investing in sales and marketing campaigns to compensate for your product is expensive and not always reliable and you might find out that customers are moving to the superior product.
This is exactly why 'Agile Development' was born: the need to move quicker, deal with ever changing requirements (as our target market and competition are never standing still), with best quality that can be assured, usually with not enough resources. Agile is what is expected from technology companies and IT divisions.
The next natural step from Agile is finding a way to take Agile to production; linking development with operations. This has given rise to 'DevOps'.
Understanding the main goal of operations is to maintain stable and healthy applications, and the main goal of development is to continually innovate and provide applications that meet business and customer needs, is crucial to the development of DevOps. While there is no doubt that change is the greatest enemy of stability, understanding and reconciling this conflict should be the main goal of DevOps.
To effectively master Agile sprint deployments and to practice DevOps, one needs to be able to implement deployment automation. Otherwise deployments and releases will require manual steps and processes, which are not always accurately repeatable, prone to human errors, and cannot be handled with high frequency.
Dealing with database deployments is tricky; unlike other software components and code or compiled code, a database is not a collection of files. It is not something you can just copy from your development to testing and to production because the database is a container of our most valued asset – the business data, which must be preserved. It holds all application content, customer transactions, etc. In order to promote database changes, a transition code needs to be developed - scripts to handle database schema structure (table structure), database code (procedures, functions, etc.), and content used by the application (metadata, lookup content or parameters tables).
The challenges of database change deployment processes
One way of dealing with the database challenge is to force the database into the generic process: create scripts out of database objects and store them in the traditional version control.
That creates other challenges, namely:
- Scripts in the version control system are not connected to the database objects they represent as these are two separated systems. Coding and testing of the database code is done at the database side, disconnected from any of the coding best practices (check-in, check-out, labels etc.), and prone to all illnesses of the 'old days' such as:
- Code-overrides in the database are common as there is nothing to prevent it.
- Scripts are required to be taken from the version control before starting the code on the database, to prevent working on the wrong version; but there is nothing to enforce that.
- Scripts do not always find their way to the version control system, as it depends on the developer to remember to do so.
- Out of process updates go unnoticed, etc.
- Scripts are manually coded, prone to human error, syntax error etc.
- To have everything you might later need, you actually have to save two to three scripts for each object; the actual code of the object, the upgrade script, and a roll-back script.
- Scripts are hard to test in their entirely, as you make changes to a single object, while someone else makes a change to another single object, and running in any order would usually raise errors due to faulty dependencies between these scripts that need to run in a specific order.
- If a script is developed as a single script to represent the entire update, instead of a single change, it can deal with dependencies, but is much harder to deal with project scope changes. It is a big list of commands.
- And these scripts, unless super sophisticated, are unaware of changes made in the target environment during the time passed from their coding to the time they are run; potentially overriding production hot-fixes, or work done in parallel by another team.
- Content changes are very hard to manage. Metadata or lookup content does not practically fit into the version control. In most cases they are simply not managed.
(Click on the image to enlarge it)
Another concept that emerged in the last decade is using tools for dealing with creation of the transition code between environments. This way of operation is tagged as 'compare & sync', meaning a mechanical comparison examines database objects in a source environment, comparing it to the target environment, and if a difference is discovered, a script to change the target object to mimic the source object is automatically created. For a while it seemed like a good solution, until the holes became more obvious.
The comparison is often performed on a database at selected check-points, usually before deployment, after the development cycle has ended:
- The compare tool is unaware of any changes that occurred before the time it ran, or any changes that took place at the target environment. We had no information about change, no version control. Just the differences at the given time.
- Keeping object scripts in a traditional version control solution while using a compare & sync tool to deploy is as un-synergic as you can imagine. One is completely unaware of the other.
- Manual inspection and detailed knowledge regarding each change must be part of a deployment process. Otherwise, mishaps like overriding good and up-to-date update to production (like a hot fix supplied by one team of developers), with an out-of-date code or structure from a second team that is working on something else entirely.
- Merging of code between different teams is out of the question. If you need to merge – you need to write the code manually.
Manual processes using a 'compare & sync' tool are possible, but require proficiency and patience. Trying to automate deployment processes based on these tools encompasses a substantial risk to the database.
DBAs, being both well aware of database deployment pitfalls and bearers of the scars of the most inopportune break downs, tend to shy away from automation based on the above processes, as they are not confident in the accuracy of the automation script generators or the ability for pre-prepared manually-generated scripts to remain true any time after they were developed. In order to avoid conflicts, they often take things into their own hands. The path of carefully examining changes and manually creating change scripts as close to the deployment event as possible seems less frustrating by comparison.
Safe database deployment automation
Achieving automation by scripting database objects change-scripts into traditional version control is limited, inflexible, disconnected from the database itself, and may be untrue and prone to miss updates of target environment because of conflicting scripts. Using 'compare & sync' tools is a risky thing to automate. The two concepts do not play together, as one is unaware of the other. A better solution must be found.
In order to take a database into proper automation, you must factor in the following:
- Proper database version control, dealing with databases’ unique challenges, while enforcing a single work process. This prevents any out of process changes, code overrides, or incomplete updates.
- Leverage proven version control best practices for complete information about who was doing what, when, and why. Making sure changes are perfectly documented is the base for later deploying them. (Click on the image to enlarge it)
- Harmony with task based development enables correlating each version control change with a change request or a trouble ticket. This enables task based deployments, partial deployments, and last minute scope changes to be coordinated between code and database.
- Ensure configuration management & consistency so every development environment, branch, trunk, sand-box, and testing or production environment follows the same structure, and matching status; or any deviation and difference are well accounted for.
- Scriptable interfaces, to deal with automation of deployment processes, providing repeatable results every single time. Even the most sophisticated solution becomes cumbersome if you have to use the UI to do the same task over and over again.
- Provide reliable deployment scripts, which are capable of dealing with conflicts and merges of database code, and cross updates from other teams; while also ignoring wrong code overrides, and are fully integrated into the version control repository. (Click on the image to enlarge)
- Provide automatically generated development scripts on the fly to deal with deploying any combination of project scope, from multi-schema mega-updates, to a single task based change and its dependent objects.
- Leveraging 'Labels' (Tagging of database structure snapshots and relevant content) before and after deployment of changes, to act as a safety-net, so quick and easy roll-backs are always close at hand.
- Fully integratable to other systems (ALM, Change management / trouble tickets, build servers, and release managers).
Implementing a solution to deal with these challenges would enable a company to practice proper database automation. Database automation which would be easy to integrate with the rest of change and release processes, to achieve a complete end to end automation.
(Click on the image to enlarge)
The database sets up a real challenge for automation. Scripting database objects change-scripts into traditional version or using 'compare & sync' tools is either an inefficient or plain risky thing to automate, as the two concepts are unaware of the other. A better solution needs to be implemented in the shape of DevOps for database.
DevOps for database should follow the proven best practices of change management, enforcing a single change process over the database, and enable dealing with deployment conflicts to eliminate the risk of code overrides, cross updates and merges of code, while plugging into the rest of the release process.
About the Author
Yaniv Yehuda is the Co-Founder and CTO of DBmaestro, an Enterprise Software Development Company focusing on database development and deployment technologies. Yaniv is also the Co-Founder and the head of development for Extreme Technology, an IT service provider for the Israeli market. Yaniv was a captain in Mamram, the Israel Defense Forces computer centers where he served as a software engineering manager.
We do it differently
First, we have a custom tool, but it's pretty simple.
Then all of our DB changes are deltas. Some projects have a single, long file representing the life of their DB from day one. Others use an include file per major version. But the net result is the same -- a long list of deltas.
Next, we have a specific DB table dedicated to tracking the changes that have been run on the database.
Finally, we run this process every single time the application starts up. It simply runs through the deltas, finds those that have not been run before, and runs them in the order presented in the file. 99+% of the time, those delta are the changes made since the last time the application was run.
Our devs have their own database instances most of the time. And the hard and fast rule is that once you commit your delta to source code control, you can't change it. If you made a mistake, fix it with another delta. At commit time, it's an append only file(s). You could always make a plea to the group to not update, but even then you have to race the build servers and pre-empt them. Typically not worth it.
We also have the ability to add specific classes that can run for particularly complicated deltas, but this is quite rare. Our tool allows us to use a contrived SQL-ish syntax to handle the different databases we support.
This is simple, and it works. It works in development, it works in production, it works in deployment. To install the app, you create the empty database and deploy the application. By the time the login screen is up, the DB is stood up with sample data and default configurations. Hands free, and painless.
The tool was simple and has become more robust over time. We have undo features and all that, and we never use them.
If there are conflicts when folks try to check in changes, they can work that out with the other developers, but in truth it's never an issue.
This technique is far, far better than any schema merging/diff tool. Those are a disaster compared to this. If you want to see the schema, deploy the app and dump it from the DB in to any of a number of tools. The script gives you the entire history, not a snapshot.
Re: We do it differently
However, I've found that having a long run list of deltas doesn't scale well when dealing with larger databases. For example when someone regretted a change, and just adds a new delta to replace it, and running both the scripts means it will take longer to get to the target state than it would if you took a shorter route. E.g: add an index, drop it and add a different one instead or to add a column and then drop it later on. If you're also dealing with replicated environments it also takes longer.
In these cases it must be possible to rewrite history and consolidate multiple changes to make them more efficient.
One thing which does make stuff easier, but which some tools doesn't take into account, is that for most database objects (except usually table and index changes) it doesn't really matter what the deltas are, since they must be fully defined regardless (goes for triggers, views, packages, functions, objects). If every version was a delta script it would just be clutter as you only really need to keep the final version around. Also it would just make it harder to find the differences if you have to sort through a dozen delta scripts than to just look at your source control histroy.
Re: We do it differently
As for the stored procedure delta issue, this is absolutely a problem. We don't do a lot of a stored procedures, but have some projects that do. They have the benefit that they do not move as quickly as mainline code, and they also don't have the timeliness and dependency issues that other db changes do (add the field to a table, then added a foreign key, then add an index, etc.). SPs can also typically be run "at the end", no matter what the DB changes are. You can (mostly) safely "edit history" for stored procedures so you don't need to have 100 instances of the 1000 line SP for every little spelling typo that slipped in.
If we were a super heavy SP shop, we'd likely have a different and more formal process for them.
What worked for me
Jonathan Bar Sela
I actually used DBMaestro at my previous company, and it worked quite nicely for us.
As I was responsible for the development of sophisticated financial software which obviously relied on large and complex DB, I looked for a VCS solution.
There weren't many options for Oracle VCS then, and I don't think much has changed. We looked at RedGate and it had little to offer us.
As I have been working in the hi-tech industry for many years, it took me by surprise that such a vital part of your software wasn't versioned.
We chose DBMaestro because it enabled us to use a simple check-in / check-out and merge solution and it allowed us to easily create deployment scripts when moving between the environment life cycle (dev->QA,PreProd and Prod).
We also managed to version data for the Metadata tables (as these definitions are an essential part of the code, and the business needed to audit them).
We achieved SOX compliance as we were always able to show what data or schema has been changed by each version, and by whom.