Source control is really important. Without source control, we would be working with single revisions of code where older changes are lost as soon as newer changes are made. Without source control, you couldn’t simply go back in time to before you attempted to implement your own version of a divide by zero function.
The term source control is a little ambiguous. Some people prefer the term version or revision control and I think that better describes the intended goal of these tools. The reason I say it's ambiguous is because a typical project is not simply a bunch of source code. What about the database for instance?
It's great to have the last 3 years of changes to your project tucked neatly away in a source code repository, until you decide to revert to a build from last year and realise that you can't because you have no idea what the database schema looked like back then.
The important thing to understand is that a project is more than just a bunch of executable code. Typically, the definition of a version includes more than the state of the files that make up the source code. Any one version also includes the state of the database schema and potentially other external component configurations. The state of these other components must also be part of the version control system so that at any point in time all the pieces of the puzzle are available for reconstruction.
Some people solve this problem by keeping a copy of all database schema modifications in .sql files. These are checked into the repository and ordered in a way that allows the database schema to be dynamically rebuilt as needed. While there's nothing wrong with this approach, it is still a very manual and error prone way of solving the problem.
Over time we've gradually moved away from raw data access via APIs such as ODBC and ADO and trended towards using tools to manage data access for us. Of course, I'm talking about ORM or Object Relational Mapping. ORM tools produce objects in the chosen programming language that represent entities in the database and often take us away from having to deal directly with SQL syntax.
Another benefit an ORM tool can offer is database version control. Due to the nature of how an ORM tool maps the database schema to an object collection, it retains a full model of the database. The model is usually some kind of file that accompanies the source code. This means that your database schema is stored alongside the source code during a check-in. As the schema changes, so does the file that holds the mapping and hence the database becomes version controlled just like any other source code file.
This means that you could check-out a version of your project from 3 weeks ago and you’ll have the ORM mapping file that modelled how the database schema looked at that point in time. Personally, I think this is much cleaner and easier than having a huge collection of .sql files.
You are probably wondering how exactly you get from this mapping file back to an instance of the database. In terms of ORM, my experience lies mostly with Microsoft's LINQ to SQL and the Entity Framework - both of which deal with this problem in a very similar way. Other ORM tools I'm sure have their own methods for going from mapping files back to database instances, but you'd have to look that up to be sure. Let's see how this works with LINQ to SQL.
In this example I have a database called BookStore which defines a simple data model for storing categories of books and their reviews. I’ve created the following LINQ to SQL mapping:
I now have a database.dbml file which maps out my database schema, including the relationships between the three tables. This file will be checked into my source code repository along with my code and can change just like source code changes.
Let’s now assume we notice a huge regression which has slipped through testing. We need to revert back to the last known working version of the application. All I have to do is check-out the particular version from the repository and instruct the ORM tool to build the database from the .dbml file.
LINQ to SQL provides what’s known as the data context class which is used for interacting with the database. In the case of restoring a database instance, the data context provides the following methods:
- bool DatabaseExists();
- void CreateDatabase();
- void DeleteDatabase();
Using these methods, we can quite easily create a project that is responsible for dynamically building the local environment ready for development. As you might expect, it’s as simple as:
class Program
{
static void Main(string[] args)
{
DataContext dataContext = new DataContext();
if (!dataContext.DatabaseExists())
{
dataContext.CreateDatabase();
}
}
}
Conclusion
Keeping your database under version control is just as important as keeping your source code under version control. If you are using an ORM tool, spend a little time investigating how it can help in terms of recreating database instances for you. You might also use this approach to have your database instance be built dynamically on an integration server prior to integration tests running – allowing you to constantly test the latest database schema changes automatically.
For anyone who hasn’t realised the implication of database version control until now, but has been using an ORM tool such as LINQ to SQL, well done - you just got database version control for free and without even knowing it.