Database Integration in your Build scripts
Imagine this scenario…
Jenn (Developer): “I’m experiencing a lot of problems testing on v1.3.1.b1 of the database using build 418”
Mike (DBA): “When using build 418, you should use v1.3.1.b2, but I also need to make a few changes to it first”
Jenn: “I just spent three hours for no reason”
Mike: “You should have asked me first”
Back in the early days of my career, I worked on quite a few projects that had a relatively efficient process for developing software. However, when it came to the database, we seemed to throw out all good judgment and retreat to the days without configuration management by using a “throw it over the wall” approach to managing the changes in the database. There was an understanding that the DBAs were doing “their thing” and we (the developers) were doing “our thing”. In those days, as developers, it was seldom that we could recreate the database on our machine using Data Definition Language (DDL) and Data Manipulation Language (DML) scripts because they either A) Did not create the DDL/DML scripts or B) Did not check them into the SCM repository. Instead, we would be at the whim of the DBA if we needed any kind of change to the database. The scenario above is based on my own experiences (the names have been changed to protect the no so innocent).
Fortunately, some of us have awoken to what we could have been doing all along: integrating our DDL and DML into our build process to ensure we are working off the same schema and test data as our fellow developers and DBAs.
To effectively integrate your database definition and manipulation into the build process, you must establish the following: DDL scripts, DML scripts, an SCM tool, a local database, and build script(s). Each is described in more detail below.
DDL
DDL is just a fancy acronym for SQL that creates your database schema. Typically, you will see code like this in a DDL script:
CREATE TABLE beer(name VARCHAR(50), brewer VARCHAR(50), date_received DATE);
DML
DML is another one of them highfalutin acronyms for SQL that means inserting, updating, and removing (etc.) data in your database. You’ll find this type of code in your DML:
INSERT INTO beer(name, brewer, date_received) VALUES ('Sam Adams Lager','Boston Beer Company','2005-12-09');
Just remember, we’re talking about test data here; you’re not going to be inserting millions of records into your local database. This is for another time and place.
SCM Tool
You need to manage your DDL and DML just like you would your source code, so you’ll need a source code management tool such as Subversion, ClearCase, Perforce or whatever you’re using at your shop. You will check in the DDL and DML scripts that will recreate your database schema and insert all of the test data.
Local Database
You need a local version of your database running so that your can run the DDL and DML scripts. If you are running this database on your developer workstation, you will probably want a lightweight version of your database so that you don’t degrade your machine’s performance. If you’re using Oracle, look into Oracle Express or McKoi as lightweight options. If you’re using SQL Server, then SQL Server Express is a viable option. If you’re using MySQL, you’re in luck, as you can use MySQL locally.
Build Scripts
Now that you’ve got your DDL/DML scripts, a local database, and you are managing your database scripts in your SCM tool, you’re ready for the final step: making your database definition/manipulation as a part of your build process. In makes no difference whether you are using Ant, NAnt, make or other build tool just as long as the script is run as part of a standard process of developing and building your software. An example in Ant and MySQL is listed below.
<target name="db:create-database" description="Create DB and Insert data">
<echo>Creating database...</echo>
<sql
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/brewery"
userid="sam"
password="@d@ms"
classpathref="db.lib.path"
delimiter=";">
<fileset file="${db.dir}/create-database.sql"/>
<fileset file="${db.dir}/insert-data.sql"/>
</sql>
</target>
By making it a part of your build script, you can run it often as part of a continuous integration process. You can add more advanced features such as incorporating database migration into the process to ensure that changes can easily be applied to an existing database. On the projects I have used this process, it encourages a more collaborative effort between the developers and DBAs and allows the developers to make simple or more complex changes to the database (depending on their experience) that may be used by others on the team. Just as you do with your source code builds, when the build breaks, it must become a priority to fix the broken build. If you’re not integrating your database scripts into your build process, I’d encourage you to try it out and see how it works for you.

September 19th, 2007 at 1:04 am
Very simple ,very precise and a very usefull guide to db ops using build scripts.
November 9th, 2007 at 8:41 pm
This is all well and good but it doesn’t address a problem — what happens if I want to also migrate a DB if a DDL changed in between the builds? Obviously I’ll need a migration script. Are there any patterns to it, especially when you may be dealing with source control branches, etc?
November 10th, 2007 at 5:38 am
>>Obviously I’ll need a migration script.
The approach I outline in this short post is targeted at putting the database into a known state for testing while in development. Yes, as you prepare for a release you’ll need to have a data migration script that includes ALTERs and the like. At a high level, you’d run the alter scripts in a similar manner as part of the build. However, you’ll need to account for the version you’re coming from and the version you’re applying to (e.g. you’re upgrading the database from 1.1 to 1.7, therefore needing to applying 1.2 to 1.7 alter changes). I will post something specific to data migration at some other point. In the meantime, have a look at http://databaserefactoring.com/
January 30th, 2008 at 3:50 am
Migration:
The way I’ve been tackling this for some time, is to always work in a file system which mirrors our source control repository layout - ie subdirs for tables, views, stored procedures etc. As I work on a release, I create/modify object schema scripts in the relevent subdirectories. However - if the object is new (ie needs to be created) I check in the completed schema script leaving the file, otherwise I check in the modified script removing the schema file. In the latter case I write a ‘delta’ ddl script which I store in the distribution root dir, which does any ddl or dml statements necessary to bring an existing database object (and the data it contains!) in line with the checked-in ddl script.
I control the whole process with Unix make, which calls Korn shell scripts which I wrote to handle each object type, in the order of dependencies. I catered for table population by having a subdirectory specifically for Sybase bcp (bulk copy, same as Oracle sql*loader) uploads.
February 21st, 2008 at 9:40 am
Does anyone know of a good, free, lightweight DB that supports all Oracle data types? I’ve tried McKoi, but it doesn’t like VARCHAR2 fields.
March 8th, 2008 at 8:29 pm
I like the idea of having the scripts and what to maintain a database solution. I also believe in keeping the DDL and core DML update ot date in source safe. However, I favor the idea of an updated master database backup. The idea here is that each build uses the latest master backup. Shipping product with a backup to me is a cleaner way compared to shipping ddl and dml scripts. The use of LINQ can be leveraged here to produce updated schemas.
Thoughts
June 4th, 2008 at 12:53 am
Jim try Firebird - firebirdsql.org
January 27th, 2010 at 5:33 pm
I came across this page when I was looking for Oracle database build scripts that could help with our agile development / database continuous integration initiative.
I have been looking for database build scripts that will take our ddl/dml source code (that we have checked into CVS) and use this to build multiple database environments (so each developer can have their own schema) I have found Schemation’s DEM (Database Environment Manager) it looks like the ideal tool to help us achieve database continuous integration but have not heard anything about it. Does anyone know anything about this tool? The ’sales pitch’ looks good and the features look great (http://www.schemation.co.uk/features.htm) but there is no documentation or trial download. I have tried contacting them but have not yet received a response (although I only sent a mail yesterday)
February 18th, 2010 at 12:26 pm
Wow, eternal different point of view (and process) between developers and DBA
I’m an experienced Oracle DBA (since r7.3.4) and also develop enough to know about developer way of doing and building things (java/django/etc).
I’m often disapointed that developers consider DBA work as a problem and a lack in agility, but we address sames architecture, but with different way of doing that and your fabulous tools to auto compile / build and deploy are like an utopy in my RDBMS world.
Theses days I look around such methology and tools that can make this 2 before distinct world more compatible, but thinking as a developer is not enough, from my DBA point of view.
Considerations are not the same, you think in term of development processus and evolving, I must think with in mind many environments / release running at the same time.
And there’ll always have difference between a development database and a production one. If I apply same storage clause in development that I do for production one (with test on staging platform), each development DB will be 1TB large (with only configuration data inside)…
And as Paul notice, there is 2 aspects of database schema with SCM :
- ability to build from scratch an empty DB (that what you need and suppose)
- ability to upgrade the database from a release to another.
–> any scm diff command, won’t produce the needed sql for such upgrade
For a java source (or other language), it’s not needed such 2nd need (patch is not your daily work)
These 2 aspects seem to be covered by a tool like Liquibase, but I’ll have to play with context and others derivative concepts in this tools, to fullfill my customizations for the differents environments. But because this tool can address this 2 orthogonal way of working, I’ll really take it into consideration (I never found another tool to address that).
There is not only one way of seeing things, it’s a recurrent point of divergence between developers and dba, that must be avoid (communicate and accept each others)
Even if DBA are often guys that made developers crazy (and vice-versa)