May 17 2009

Using DBIx::Class to Version Your Schema

Category: Databases,Deployment,Schema Versioningjgoulah @ 5:53 PM

Intro to DBIx::Class

In my opinion DBIx::Class is one of the best ORM solutions out there. Not only can it model your database, including mapping out any foreign key relationships, it can also be used as a canonical point of reference for your schema.   This means we can use it not only as an application layer interface to the database, but can also define a versioned database structure using the code,  where if you add a field into your result class,  you can generate a version of alter statements and a DDL just by running a simple script.   Of course this set of alterations can later be applied to your stage and production environments.

It’s possible to produce your schema classes with DBIx::Class::Schema::Loader if you have a pre-existing database and are initially generating your DBIC classes, but this article is going to show going from the ground up how to build your database from the code defining the fields and tables as well as the relations, which match to foreign key constraints when the database definition files are generated.

Creating your Schema

This example will be a pretty simple schema but enough to demonstrate generating a couple of tables with a foreign key constraint. Since a lot of people are doing social networking stuff these days we can assume a common use case would be a user who has a bunch of photos. So we’ll need to model a user table and a photo table, but first things first, lets create a few basic things, assuming we’ll call our application MySocialApp

mkdir -p MySocialApp/lib/MySocialApp/Schema/Result
cd MySocialApp

We need to create a schema file at lib/MySocialApp/Schema.pm that inherits from the base DBIx::Class::Schema

package MySocialApp::Schema;
use base qw/DBIx::Class::Schema/;

use strict;
use warnings;
our $VERSION = '0.00001';

__PACKAGE__->load_namespaces();

__PACKAGE__->load_components(qw/Schema::Versioned/);

__PACKAGE__->upgrade_directory('sql/');

1;

So here all we are doing is extending the base Schema and loading the Versioned component. We’re setting the directory where DDL and Diff files will be generated into the sql directory. We’re also invoking load_namespaces which tells it to look at the MySocialApp::Schema::Result namespace for our result classes by default.

Creating the Result Classes

Here I will define what the database is going to look like

package MySocialApp::Schema::Result::User;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('user');

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

__PACKAGE__->set_primary_key('user_id');

__PACKAGE__->has_many(
    'photos',
    'MySocialApp::Schema::Result::Photo',
    { "foreign.fk_user_id" => "self.user_id" },
);

1;

This is relatively straightforward. I’m creating a user table that has an auto incrementing primary key and a username field. I’m also defining a relationship that says a user can have many photos. Lets create the photo class.

package MySocialApp::Schema::Result::Photo;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('photo');

 __PACKAGE__->add_columns(

    photo_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    url => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    fk_user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        extra => { unsigned => 1 },
    },
);

__PACKAGE__->set_primary_key('photo_id');

 __PACKAGE__->belongs_to(
    'user',
    'MySocialApp::Schema::Result::User',
    { 'foreign.user_id' => 'self.fk_user_id' },
);


1;

Same basic thing here, a photo class with a photo_id primary key, a url field, and an fk_user_id field that keys into the user table. Each photo belongs to a user, and this relationship will define our foreign key constraint when the schema is generated.

Versioning the Database

Create the Versioning Scripts

We have the main DBIx::Class pieces in place to generate the database, but we’ll need a couple of scripts to support our versioned database. One script will generate the schema based on the version before it, introspecting which alterations have been made and producing a SQL diff file to alter the database. The other script will look at the database to see if it needs upgrading, and run the appropriate diff files to bring it up to the current version.

First the schema and diff generation script which we’ll call script/gen_schema.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use Pod::Usage;
use Getopt::Long;
use MySocialApp::Schema;

my ( $preversion, $help );
GetOptions(
        'p|preversion:s'  => \$preversion,
        ) or die pod2usage;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );
my $version = $schema->schema_version();

if ($version && $preversion) {
    print "creating diff between version $version and $preversion\n";
} elsif ($version && !$preversion) {
    print "creating full dump for version $version\n";
} elsif (!$version) {
    print "creating unversioned full dump\n";
}

my $sql_dir = './sql';
$schema->create_ddl_dir( 'MySQL', $version, $sql_dir, $preversion );

This script will be run anytime we change something in the Result files. You give it the previous schema version, and it will create a diff between that and the new version. Before running this you’ll update the $VERSION variable in lib/MySocialApp/Schema.pm so that it knows a change has been made.

The next script is the upgrade script, we’ll call it script/upgrade_db.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use MySocialApp::Schema;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );

if (!$schema->get_db_version()) {
    # schema is unversioned
    $schema->deploy();
} else {
    $schema->upgrade();
}

This script checks to see if any diffs need to be applied, and applies them if the version held by the database and the version in your Schema.pm file differ, bringing the database up to the correct schema version.

Note, in these scripts I’ve hardcoded the DB info which really should go into a configuration file.

Create a Database to Deploy Into

We need to create the database that our tables will be created in. In the connect calls above we’re using this user and password to connect to our database. I’m using MySQL for the example so this would be done on the MySQL command prompt

mysql> create database mysocialapp;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on mysocialapp.* to mysocialapp@'localhost' identified by 'mysocialapp4u';
Query OK, 0 rows affected (0.01 sec)

Deploy the Initial Schema

Now its time to deploy our initial schema into MySQL. But for the first go round we also have to create the initial DDL file, this way when we make changes in the future it can be compared against the Schema result classes to see what changes have been made. We can do this by supplying a nonexistent previous version to our gen_schema.pl script

$ perl script/gen_schema.pl -p 0.00000
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.
creating diff between version 0.00001 and 0.00000
No previous schema file found (sql/MySocialApp-Schema-0.00000-MySQL.sql) at /home/jgoulah/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1685.

And we can see the DDL file now exists

$ ls sql/
MySocialApp-Schema-0.00001-MySQL.sql

Then we need to deploy to MySQL for the first time so we run the upgrade script

$ perl script/upgrade_db.pl
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.

Now check out what its created in MySQL

mysql> use mysocialapp;
Database changed
mysql> show tables;
+----------------------------+
| Tables_in_mysocialapp      |
+----------------------------+
| dbix_class_schema_versions |
| photo                      |
| user                       |
+----------------------------+
3 rows in set (0.00 sec)

There is our photo table, our user table, and also a dbix_class_schema_versions table. This last table just keeps track of what version the database is. You can see we are in sync with the Schema class by selecting from that table and also when this version was installed.

mysql> select * from dbix_class_schema_versions;
+---------+---------------------+
| version | installed           |
+---------+---------------------+
| 0.00001 | 2009-05-17 21:59:57 |
+---------+---------------------+
1 row in set (0.00 sec)

The really great thing is that we have created the tables -and- constraints based on our schema result classes. Check out our photo table

mysql> show create table photo\G
*************************** 1. row ***************************
       Table: photo
Create Table: CREATE TABLE `photo` (
  `photo_id` int(11) unsigned NOT NULL auto_increment,
  `url` varchar(255) NOT NULL,
  `fk_user_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`photo_id`),
  KEY `photo_idx_fk_user_id` (`fk_user_id`),
  CONSTRAINT `photo_fk_fk_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Making Database Changes

Lets say we want to add a password field to our user table. I’d open up the lib/MySocialApp/Schema/Result/User.pm file and add a section for my password field to the add_columns definition so now it looks like:

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    password => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

Then we update the lib/MySocialApp/Schema.pm file and update to the next version so it looks like

our $VERSION = '0.00002';

To create the DDL and Diff for this version we run the gen_schema script with the previous version as the argument

$ perl script/gen_schema.pl -p 0.00001
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.
creating diff between version 0.00002 and 0.00001

If you look in the sql directory there are two new files. The DDL is named MySocialApp-Schema-0.00002-MySQL.sql and the diff is called MySocialApp-Schema-0.00001-0.00002-MySQL.sql and has the alter statement

$ cat sql/MySocialApp-Schema-0.00001-0.00002-MySQL.sql
-- Convert schema 'sql/MySocialApp-Schema-0.00001-MySQL.sql' to 'MySocialApp::Schema v0.00002':;

BEGIN;

ALTER TABLE user ADD COLUMN password VARCHAR(255) NOT NULL;

COMMIT;

Now we can apply that with our upgrade script

$ perl script/upgrade_db.pl
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.

DB version (0.00001) is lower than the schema version (0.00002). Attempting upgrade.

and see that the upgrade has been made in MySQL

mysql> describe user;
+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| user_id  | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(255)     | NO   |     |         |                |
| password | varchar(255)     | NO   |     |         |                |
+----------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Conclusion

So now we’ve seen a really easy way that we can maintain our database schema from our ORM code. We have a versioned database that is under revision control, and can keep our stage and production environments in sync with our development code. With this type of setup its also easy to maintain branches with different database changes and merge them into the mainline, and it also ensures that the database you are developing on is always in sync with the code.

Tags: , , ,


Mar 14 2009

Code Deployment Techniques

Category: Deploymentjgoulah @ 7:09 PM

Introduction

Getting code out to your target production machines is a bit of an art form that many seem to take for granted. A lot of shops that I talk to seem to think of the deployment process as an svn up on each node. Even if this works for you, it’s really not a great way to deploy code. The main reason is that it does not scale well. You lend yourself to having different versions of code on different machines at the same time. For example, if your load balancing algorithm made the logs on one machine fill up faster than another and you run out of disk space during the svn update on a node. The correct process should be such that different releases of code can by deployed nearly instantaneously, with old copies left around for rollback. If you just svn up your code has the potential to be out of sync a lot more often than if you deploy code with an rsync and provide your webserver symbolic links to the current version of code. This article will show how this process can work in a scalable way.

Like most of my articles, this is targeted towards shops that are running something along the lines of your typical LAMP stack. I’ve setup this deployment architecture under a PHP setup with Apache, and also a slightly more complicated Perl setup running ModFastCGI instances behind Apache. In one of my most complex deployments I’ve had the database updates go out simultaneously along with the code that worked with those changes, so there is a lot of flexibility in what you can do. I’ll be talking at a very high level on how to accomplish this.

Code Environments

Deployment is the process of getting the code from your development environment into a working production system. There are usually at least three environments: development, staging, and production. Some places may have a demo or test environment for their own specific purpose but we will focus on the main three. Your development environment is active when code is being written and tested by the developers. Typically the development environment points at a ‘dev’ database or if you are using something like MySQL Sandbox perhaps each developer has his own database to work against. Your staging environment should be as close of a replica to your production environment as possible.  I typically deploy staging on the exact same servers as production, but point it to a copy of the production database instead of the real thing.  Its ideal for this copy to be taken nightly with something like MySQL LVM snapshotting so you’re working on similar data.  When staging code is assured to be “bug free” it is then pushed to your live production environement.

Its a good idea to have your code respect these environments. The initialization for your application should be able to infer the environment and pull the appropriate config based on that so that you are pointing to the right DB and any other specifics for that particular environment. One way to do this is to use an environment variable and define that inside of your virtual host file. Since you’ll have a separate virtual host for staging and production your application can then look at the variable and know what configuration to grab.

The Build Process

To get your code from development to staging you have to undergo some form of a build process. Some build processes are more complicated than others. A simple build process may just be as trivial as doing an svn export on your repository to the location you wish to push that code from. More complex builds may want to build and bundle external libraries that the application needs to work. For example in a Perl deployment I’ve compiled the required modules on the build step of each release so that the build consists of the same version of modules that the code was written against. This was really important since CPAN is ever changing, otherwise its possible in a rollback situation that there would be incompatibilities. For the purposes of simplicity I’ll just assume you are deploying code that has everything setup and working on the server (such as PHP usually is).

The build should also be responsible for putting the code into a versioned directory. There are a few benefits here. One is that you can correspond a build of code to the repository revision that it correlates to. Second is that it makes rollback a cinch because multiple versions of the code can exist on the server at the same time. Apache knows which version is active because the virtual host for the active version of code is pointed to by a symbolic link that gets modified as part of the deploy procedure.

Generating a Version

I’ve explained that your build script should generate a version and it makes sense for this to come from your repository revision that is getting deployed. Here’s a short bash function that can generate a version from an SVN repository (also works with SVK which is an SVN wrapper)

function get_version {
  if [ -e .svn ]; then
    eval $1=$(svn info | grep ^Revision | perl -ne "/Revision: (\d+)/; printf('1.%06i', \$1);")
  else
    eval $1=$(svk info | grep "Mirrored From" | perl -ne "/Rev. (\d+)/; printf('1.%06i', \$1);")
  fi
}

and now the build script can do something like

cd /path/to/svn/checkout
svn up
get_version version
echo "this is the version: $version"

Exporting the Code

Now that you’ve generated a version you should export that version of code into the directory you are going to push from. For this example we’ll say that we store versions in /opt/test.com so the structure might look something like:

/opt/test.com
     {version1}
     {version2}
     {versionN}
     ...

Then the code would live under version1, version2, … versionN for each version of code deployed.

One way to do this is something like:

mkdir /opt/test.com/$version
svn export -q svn://${svn_host}/trunk ${site_dir}/${version}

Informing Apache of the Document Root

Apache needs to know where the new version of code is. Since we’ve decided to put it into an ever changing directory name (the build version) there are at least two methods for letting Apache know the document root of the code. One way is to dynamically generate a virtual host with the explicit versioned path from above (eg. /opt/test.com/0.00010). The document root here will always point to the version that its bundled with, and then on deployment, a symlink from apache’s conf.d directory is pointed to the active virtual host config file. Another way to do this is to have a more static virtual host file, in which the document root path points to a symlinked directory to the version which changes on deploy (eg. /opt/test.com/current-www such that current-www is a symlink to the appropriate version).

The Deploy Process

In the end the point of deployment is to get your code into production. The version of code should first be rolled into staging, and then when verified as good, into production. The beauty of this is that all that it takes to move code from staging to production is an update to a symbolic link. The current production symlink is wiped out, and pointed to the new version of code, the same one that we deployed for staging.

The deployment should at a minimum

  • rsync code to each web server
  • point the symbolic links to the new version of code
  • restart apache

The great thing is that the symbolic links are not repointed until all of the code already exists on each server (the rsync to all servers should happen first).   This gives a deploy that doesn’t leave servers out of sync with each other.  The deploy is instantaneous and verified working without trying to rsync or svn up over top of running production code.

Conclusion

I’ve gone over a robust way to do code deployments that can be fully automated with a little bit of scripting.  Its an easy way to get code out to production and also to ensure that the exact version of code that currently exists in staging is the version of code that goes out.   It gives us a very easy way to rollback if a major problem does happen to be found in production and to quickly see what version is currently deployed.  And it allows us to change versions of code simultaneously without the end user ever knowing.

Tags: ,


Jan 13 2009

Using SQLT to Create Database Diffs

Category: Schema Versioningjgoulah @ 11:19 PM

Overview

Database diffs can be useful for a variety of use cases, from outputting the differences in your development versus production schema to full blown database versioning systems.   A database diff is similar in nature to a file diff, but instead of outputting the differences between two files,  it will output the set of alterations to be made to bring one databases schema in sync with another.   There are a variety of tools that will accomplish this for you, but I am going to use a tool called SQL::Translator.   SQLT can do quite a lot for you, such as converting among different schema dialects (think MySQL to Oracle) or outputting an ERD of your database, but I’ll stick with using it as a simple diff tool for this article.

Installation

The tool can be found here on CPAN.   I suggest using local::lib for perl module installation to avoid needing to install things as root.  I’ve written up detailed instructions in the past, but its pretty straightforward if you follow the bootstrap instructions.  In any case, if you are lucky you will be able to just do a normal cpan install like so:

cpan SQL::Translator

For our examples you’ll also need the DBD::MySQL driver. This can be a real pain to install from CPAN so you may want to try to use your systems package management for this. Under Debian this is called libdbd-mysql-perl and in Redhat its called perl-DBD-MySQL.

Diffing Two DDL Files

Once you have SQLT installed, you’ll want to grab the DDL files from the databases that you are interested in producing diffs for.  If you are using mysql this is easy enough to do with mysqldump.

So lets say you want to compare your development and production databases, we’d grab those schema dumps, assuming your dev db is on host ‘dev’ and prod db on host ‘prod’:

mysqldump --no-data -h dev -u your_dbuser -p your_dbname > dev_db.sql
mysqldump --no-data -h prod -u your_dbuser -p your_dbname > prod_db.sql

The SQLT install comes with a handy script called sqlt-diff which you can use to diff two different DDL files. So in the case of MySQL all you have to run is:

sqlt-diff dev_db.sql=MySQL prod_db.sql=MySQL > my_diff.sql

You should be able to pass other arguments where we’ve used ‘MySQL’ here, and you can get a list via:

sqlt -l

Diffing DDL Against a Database

The above may be all you need, but perhaps you want to diff a DDL against the actual database for whatever reason. I recently used this approach to write a versioning system in which the database changes could be output into a file containing the alterations as well as a new DDL for each “version”, this way changes are not lost or forgotten.

SQLT doesn’t currently have any scripts to handle this (yet) so we’ll have to use the perl module itself to do what we want. A simple perl script is all we need.

#!/usr/bin/perl

use strict;
use warnings;
use SQL::Translator;
use SQL::Translator::Diff;
use DBI;

# set the path of the file you want to diff against the DB
$ddl_file = "/path/to/file_to_diff.sql";

# make sure to fill in db_name, host_name, and db_port
my $dsn = "DBI:mysql:database=db_name;host=host_name;port=db_port";

# you'll have to set the db_user and db_pass here
DBI->connect($dsn, 'db_user', 'db_pass', {'RaiseError' => 1});

# create a SQLT object that connect to the database via our dbh
my $db_tr = SQL::Translator->new({
      add_drop_table => 1,
      parser => 'DBI',
      parser_args => {  dbh => $dbh }
    });
$db_tr->parser('SQL::Translator::Parser::DBI::MySQL');


$db_tr->parse($db_tr, $dbh);

# create a SQLT object that corresponds to our ddl file
my $file_tr = SQL::Translator->new({
    parser => 'MySQL'
  });
my $out = $file_tr->translate( $ddl_file ) or die $file_tr->error;

# perform the actual diff (some of the options you use may vary)
my $diff = SQL::Translator::Diff::schema_diff(
    $file_tr->schema,  'MySQL',
    $db_tr->schema,  'MySQL',
    {
    ignore_constraint_names => 1,
    ignore_index_names => 1,
    caseopt => 1
    }
);

# now just print out the diff
my $file;
my $filename = "diff_output.sql"
if(!open($file, ">$filename")) {
  die("Can't open $filename for writing ($!)");
  next;
}
print $file $diff;

So we’ve used the SQL Translator tool to read in our database, read in our DDL, and produce a diff. Pretty neat!

Conclusion

We’ve really only scratched the surface with some of the capabilities of SQL::Translator, but its a pretty handy thing to be able to easily diff various schemas. Its important that your development and production schema are in sync, and stay in sync, and this gives an easy way to start on automating that process.

Tags: , , , , , , ,