Apr 23 2013

Crossing the Production Barrier: Development at Scale

Category: Conferences,Databasesjgoulah @ 5:59 PM

Slides from my talk at Percona 2013

Tags:


Apr 15 2012

Percona Live 2012 – The Etsy Shard Architecture

Category: Conferences,Databases,Systemsjgoulah @ 2:58 PM

I attended Percona Conf in Santa Clara last week. It was a great 3 days of lots of people dropping expert MySQL knowledge. I learned a lot and met some great people.

I gave a talk about the Etsy shard architecture, and had a lot of good feedback and questions. A lot of people do active/passive, but we run everything in active/active master-master. This helps us keep a warm buffer pool on both sides, and if one side goes out we only lose about half of the queries until its pulled and they start hashing to the other side. Some details on how this works in the slides below.

Tags: , , , ,


Jan 09 2012

Distributed MySQL Sleuthing on the Wire

Category: Databases,Real-time Web,SSH,Systemsjgoulah @ 8:52 AM

Intro

Oftentimes you need to know what MySQL is doing right now and furthermore if you are handling heavy traffic you probably have multiple instances of it running across many nodes. I’m going to start by showing how to take a tcpdump capture on one node, a few ways to analyze that, and then go into how to take a distributed capture across many nodes for aggregate analysis.

Taking the Capture

The first thing you need to do is to take a capture of the interesting packets. You can either do this on the MySQL server or on the hosts talking to it. According to this percona post this command is the best way to capture mysql traffic on the eth0 interface and write it into mycapture.cap for later analysis:

% tcpdump -i eth0 -w mycapture.cap -s 0 "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
47542 packets captured
47703 packets received by filter
60 packets dropped by kernel

Analyzing the Capture

The next step is to take a look at your captured data. One way to do this is with tshark, which is the command line part of wireshark. You can do yum install wireshark or similar to install it. Usually you want to do this on a different host than the one taking traffic since it can be memory and CPU intensive.

You can then use it to reconstruct the mysql packets like so:

% tshark -d tcp.port==3306,mysql -T fields -R mysql.query -e frame.time -e ip.src -e ip.dst -e mysql.query -r mycapture.cap

This will give you the time, source IP, destination IP, and query but this is still really raw output. Its a nice start but we can do better. Percona has released the Percona Toolkit which includes some really nice command line tools (including what used to be in Maatkit).

The one we’re interested in here is pt-query-digest

It has tons of options and you should read the documentation, but here’s a few I’ve used recently.

Lets say you want to get the top tables queried from your tcpdump

% tcpdump -r mycapture.cap -n -x -q -tttt | pt-query-digest --type tcpdump --group-by tables --order-by Query_time:cnt \
 --report-format profile --limit 5
reading from file mycapture.cap, link-type EN10MB (Ethernet)

# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M   Item
# ==== ======== ============= ===== ====== ==== ===== ====================
#    1 0x        0.3140  6.1%   674 0.0005 1.00  0.00 shard.images
#    2 0x        0.8840 17.1%   499 0.0018 1.00  0.03 shard.activity
#    3 0x        0.1575  3.1%   266 0.0006 1.00  0.00 shard.listing_images
#    4 0x        0.1680  3.3%   265 0.0006 1.00  0.00 shard.connection_edges_reverse
#    5 0x        0.0598  1.2%   254 0.0002 1.00  0.00 shard.listing_translations
# MISC 0xMISC    3.5771 69.3%  3534 0.0010   NS   0.0 <86 ITEMS>

Note the tcpdump options I used this time, which the tool requires to work properly when passing –type tcpdump. I also grouped by tables (as opposed to full queries) and ordered by the count (the Calls column). It will stop at your –limit and group the rest into MISC so be aware of that.

You can remove the –order-by to sort by response time, which is the default sort order, or provide other attributes to sort on. We can also change the –report-format, for example to header:

% tcpdump -r mycapture.cap -n -x -q -tttt | pt-query-digest --type tcpdump --group-by tables --report-format header 
reading from file mycapture.cap, link-type EN10MB (Ethernet)

# Overall: 5.49k total, 91 unique, 321.13 QPS, 0.30x concurrency _________
# Time range: 2012-01-08 15:52:05.814608 to 15:52:22.916873
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             5s     3us   114ms   939us     2ms     3ms   348us
# Rows affecte         316       0      13    0.06    0.99    0.29       0
# Query size         3.64M      18   5.65k  694.98   1.09k  386.68  592.07
# Warning coun           0       0       0       0       0       0       0
# Boolean:
# No index use   0% yes,  99% no

If you set the –report-format to query_report you will get gobs of verbose information that you can dive into and you can use the –filter option to do things like getting slow queries:

% tcpdump -r mycapture.cap -n -x -q -tttt | \
  pt-query-digest --type tcpdump --filter '($event->{No_index_used} eq "Yes" || $event->{No_good_index_used} eq "Yes")'

Distributed Capture

Now that we’ve taken a look at capturing and analyzing packets from one host, its time to dive into looking at our results across the cluster. The main trick is that tcpdump provides no option to stop capturing – you have to explicitly kill it. Otherwise we’ll just use dsh to send our commands out. We’ll assume you have a user that can hop around in a password-less fashion using ssh keys – setting that up is well outside the scope of this article but there’s plenty of info out there on how to do that.

There’s a few ways you can let a process run on a “timeout” but I’m assuming we don’t have any script written or tools like bash timeout or the one distributed in coreutils available.

So we’re going off the premise that you will background the process and kill it after a sleep by grabbing its pid:

( /path/to/command with options ) & sleep 5 ; kill $!

Simple enough, except we’ll want to capture the output on each host, so we need to ssh the output back over to the target using a pipe to grab the stdout. This means that $! will return the pid of our ssh command instead of our tcpdump command. We end up having to do a little trick to kill the right process, since the capture won’t be readable if we kill ssh command that is writing the output. We’ll need to kill tcpdump and to do that we can look at the parent pid of the ssh process, ask pkill (similar to pgrep) for all of the processes that have this parent, and finally kill the oldest one, which ends up being our tcpdump process.

Then end result looks like this if I were to run it across two machines:

% dsh -c -m web1000,web1001 \
   'sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | \
   ssh dshhost "cat - > ~/captures/$(hostname -a).cap" & sleep 10 ; \
   sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")'

So this issues a dsh to two of our hosts (you can make a dsh group with 100 or 1000 hosts though) and runs the command concurrently on each (-c). We issue our tcpdump on each target machine and send the output to stdout for ssh to then cat back to a directory on the source machine that issued the dsh. This way we have all of our captures in one directory with each file named with the target name of each host the tcpdump was run. The sleep is how long the dump is going to run for before we then kill off the tcpdump.

The last piece of the puzzle is to get these all into one file and we can use the mergecap tool for this, which is also part of wireshark:

% /usr/sbin/mergecap -F libpcap -w output.cap *.cap

And then we can analyze it like we did above.

Further Reading

References

http://www.mysqlperformanceblog.com/2011/04/18/how-to-use-tcpdump-on-very-busy-hosts

http://stackoverflow.com/questions/687948/timeout-a-command-in-bash-without-unnecessary-delay

http://www.xaprb.com/blog/2009/08/18/how-to-find-un-indexed-queries-in-mysql-without-using-the-log/

Breaking the distributed command down further

Just to clarify this command a bit more, particularly how the kill part works since that was the trickiest part for me to figure out.

When we run this

$ dsh -c -m web1000,web1001 \
   'sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | \
   ssh dshhost "cat - > ~/captures/$(hostname -a).cap" & sleep 10 ; \
   sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")'

on the server the process list looks something like

user     12505 12504  0 03:12 ?        00:00:00 bash -c sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | ssh myhost.myserver.com "cat - > /home/etsy/captures/$(hostname -a).cap" & sleep 5 ; sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")
pcap     12506 12505  1 03:12 ?        00:00:00 /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2
user     12507 12505  0 03:12 ?        00:00:00 ssh myhost.myserver.com cat - > ~/captures/web1001.cap

So $! is going to return the pid of the ssh process, 12507. We use awk to find the process matching that, and then print the parent pid out, which is then passed to the -P arg of pkill. If you use pgrep to look at this without the -o you’d get a list of the children of 12505, which are 12506 and 12507. The oldest child is the tcpdump command and so adding -o kills that guy off.

So if we were only running the command on one host we could use something much simpler

ssh dbhost01 '(sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 port 3306) & sleep 10; sudo kill $!' | cat - > output.cap

Tags: , , , , ,


Dec 13 2009

Using Amazon Relational Database Service

Category: Cloud Computingjgoulah @ 2:09 PM

Intro

Amazon recently released a new service that makes it easier set up, operate, and scale a relational database in the cloud called Amazon Relational Database Service. The service, based on MySQL for now, has its pluses and minuses and you should decide whether it fits your needs. The advantages are that it has an automated backup system that lets you restore to any point within the last 5 minutes and also allows you to easily take a snapshot at any time. It is also very easy to “scale up” your box. This is more in the realm of vertical scaling but if you find you are hitting limits you can upgrade to a more powerful server with little to no effort. It also gives you monitoring via Amazon Cloudwatch and automatically patches your database during your self defined maintenance windows. The downfalls are that you don’t have access directly to the box itself, so you can’t ssh in. You also at this point cannot use replication for a master-slave style setup. Amazon promises to have more high availability options forthcoming. Since you can’t ssh in, you adjust mysql parameters via their db parameter group API. I’ll go over an example of this.

Creating a Database Instance

The first thing to do is install the RDS command line API, which you can grab from here. I’m not going over the details of setting this up. Its basically as simple as putting it into your path and Amazon has plenty of documentation on this.

Once you have the command line tools setup you can create a database like so

rds-create-db-instance \
        mydb-instance \
        --allocated-storage 20 \
        --db-instance-class db.m1.small \
        --engine MySQL5.1  \
        --master-username masteruser \
        --master-user-password mypass \
        --db-name mydb --headers \
        --preferred-maintenance-window 'Mon:06:15-Mon:10:15' \ 
        --preferred-backup-window  '10:15-12:15' \
        --backup-retention-period 1 \
        --availability-zone us-east-1c 

This should be fairly self explanatory. I’m creating an instance called mydb-instance. The master (basically root) user is called masteruser with password mypass. It also creates an initial database called mydb. You can add more databases and permissions later. This also sets up the maintenance and backup windows which are required, and defined in UTC. The backup retention period is how long it holds on to my backups, which I’ve defined as 1 day. If you set this to 0 it will disable the automated backups entirely which is not advised.

The next thing to do is setup your security groups so that your EC2 (or your hosted servers) have access to your database. There is good documentation on this so I will go over a basic use case.

rds-authorize-db-security-group-ingress \
        default \
        --ec2-security-group-name webnode \
        --ec2-security-group-owner-id XXXXXXXXXXXX 

In the case above I’m creating a security group called default, that allows my ec2 security group webnode access. The group-owner-id parameter is your AWS account id.

You can find what your database DNS name is via the rds-describe-db-instances command.

rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  default.mysql5.1  in-sync

So we can see our hostname is mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com

Now you can login to your instance in the usual way that you access mysql on the command line, setup your users and import your database in the usual way.

mysql -u masteruser -h mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com -pmypass

Using Parameter Groups to View the MySQL Slow Queries Log

As I mentioned earlier you don’t have access to ssh into the instance, so you need to use db parameter groups to tweak your configuration rather than editing the my.cnf file. You can’t see the mysql slow query log on the box but there is still a way to access it and I’ll go over that process.

Amazon won’t let you edit the default group, so the first thing to do is create a parameter group to define your custom parameters.

rds-create-db-parameter-group my-custom --description='My Custom DB Param Group' --engine=MySQL5.1

Then set the parameter to turn the query log on

rds-modify-db-parameter-group my-custom  --parameters="name=slow_query_log, value=ON, method=immediate"

We’re still using the default configuration so you have to tell the instance to use your custom parameter group

rds-modify-db-instance mydb-instance --db-parameter-group-name=my-custom

The first time you apply a new custom group you have to reboot the instance, as pending-reboot here indicates

$ rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  pending-reboot

So we can reboot it immediately like so

$ rds-reboot-db-instance mydb-instance

When it comes back up it will show that its in-sync

$ rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  in-sync

We can login to the instance and see that our parameter was set correctly

mysql> show global variables like 'log_slow_queries';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| log_slow_queries | ON    | 
+------------------+-------+
1 row in set (0.00 sec)

Since you don’t have access to the filesystem, its logged to a table on the mysql database

mysql> use mysql;

mysql> describe slow_log;
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Field          | Type         | Null | Key | Default           | Extra                       |
+----------------+--------------+------+-----+-------------------+-----------------------------+
| start_time     | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | 
| user_host      | mediumtext   | NO   |     | NULL              |                             | 
| query_time     | time         | NO   |     | NULL              |                             | 
| lock_time      | time         | NO   |     | NULL              |                             | 
| rows_sent      | int(11)      | NO   |     | NULL              |                             | 
| rows_examined  | int(11)      | NO   |     | NULL              |                             | 
| db             | varchar(512) | NO   |     | NULL              |                             | 
| last_insert_id | int(11)      | NO   |     | NULL              |                             | 
| insert_id      | int(11)      | NO   |     | NULL              |                             | 
| server_id      | int(11)      | NO   |     | NULL              |                             | 
| sql_text       | mediumtext   | NO   |     | NULL              |                             | 
+----------------+--------------+------+-----+-------------------+-----------------------------+
11 rows in set (0.11 sec)

We may also want to set things like the slow query time, since the default of 10 is pretty high

$ rds-modify-db-parameter-group my-custom  --parameters="name=long_query_time, value=3, method=immediate"

The rds-describe-events command keeps a log of what you’ve been doing

$ rds-describe-events 
db-instance         2009-12-12T17:44:19.546Z  mydb-instance  Updated to use a DBParameterGroup my-custom
db-instance         2009-12-12T17:45:51.636Z  mydb-instance  Database instance shutdown
db-instance         2009-12-12T17:46:09.380Z  mydb-instance  Database instance restarted
db-parameter-group  2009-12-12T17:56:02.568Z  my-custom        Updated parameter long_query_time to 3 with apply method immediate

And again you can check mysql that your parameter was edited properly. Note how this time we didn’t have to reboot anything as our parameter group is already active on this instance

mysql> show global variables like 'long_query_time';
+-----------------+----------+
| Variable_name   | Value    |
+-----------------+----------+
| long_query_time | 3.000000 | 
+-----------------+----------+
1 row in set (0.00 sec)

Summary

In this article we went over some basics of Amazon RDS and why you may or may not want to use it. If you are just starting out its a really easy way to get a working mysql setup going. However if you are porting from an architecture with multiple slaves or other HA options this may not be for you just yet. We also went over some basic use cases on how to tweak parameters since there is no command line access to the box itself. There is command line access to mysql though, so you can use your favorite tools there.

References

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2935&categoryID=295

Tags: , , , ,


May 17 2009

Using DBIx::Class to Version Your Schema

Category: Databases,Deployment,Schema Versioningjgoulah @ 5:53 PM

Intro to DBIx::Class

In my opinion DBIx::Class is one of the best ORM solutions out there. Not only can it model your database, including mapping out any foreign key relationships, it can also be used as a canonical point of reference for your schema.   This means we can use it not only as an application layer interface to the database, but can also define a versioned database structure using the code,  where if you add a field into your result class,  you can generate a version of alter statements and a DDL just by running a simple script.   Of course this set of alterations can later be applied to your stage and production environments.

It’s possible to produce your schema classes with DBIx::Class::Schema::Loader if you have a pre-existing database and are initially generating your DBIC classes, but this article is going to show going from the ground up how to build your database from the code defining the fields and tables as well as the relations, which match to foreign key constraints when the database definition files are generated.

Creating your Schema

This example will be a pretty simple schema but enough to demonstrate generating a couple of tables with a foreign key constraint. Since a lot of people are doing social networking stuff these days we can assume a common use case would be a user who has a bunch of photos. So we’ll need to model a user table and a photo table, but first things first, lets create a few basic things, assuming we’ll call our application MySocialApp

mkdir -p MySocialApp/lib/MySocialApp/Schema/Result
cd MySocialApp

We need to create a schema file at lib/MySocialApp/Schema.pm that inherits from the base DBIx::Class::Schema

package MySocialApp::Schema;
use base qw/DBIx::Class::Schema/;

use strict;
use warnings;
our $VERSION = '0.00001';

__PACKAGE__->load_namespaces();

__PACKAGE__->load_components(qw/Schema::Versioned/);

__PACKAGE__->upgrade_directory('sql/');

1;

So here all we are doing is extending the base Schema and loading the Versioned component. We’re setting the directory where DDL and Diff files will be generated into the sql directory. We’re also invoking load_namespaces which tells it to look at the MySocialApp::Schema::Result namespace for our result classes by default.

Creating the Result Classes

Here I will define what the database is going to look like

package MySocialApp::Schema::Result::User;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('user');

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

__PACKAGE__->set_primary_key('user_id');

__PACKAGE__->has_many(
    'photos',
    'MySocialApp::Schema::Result::Photo',
    { "foreign.fk_user_id" => "self.user_id" },
);

1;

This is relatively straightforward. I’m creating a user table that has an auto incrementing primary key and a username field. I’m also defining a relationship that says a user can have many photos. Lets create the photo class.

package MySocialApp::Schema::Result::Photo;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('photo');

 __PACKAGE__->add_columns(

    photo_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    url => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    fk_user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        extra => { unsigned => 1 },
    },
);

__PACKAGE__->set_primary_key('photo_id');

 __PACKAGE__->belongs_to(
    'user',
    'MySocialApp::Schema::Result::User',
    { 'foreign.user_id' => 'self.fk_user_id' },
);


1;

Same basic thing here, a photo class with a photo_id primary key, a url field, and an fk_user_id field that keys into the user table. Each photo belongs to a user, and this relationship will define our foreign key constraint when the schema is generated.

Versioning the Database

Create the Versioning Scripts

We have the main DBIx::Class pieces in place to generate the database, but we’ll need a couple of scripts to support our versioned database. One script will generate the schema based on the version before it, introspecting which alterations have been made and producing a SQL diff file to alter the database. The other script will look at the database to see if it needs upgrading, and run the appropriate diff files to bring it up to the current version.

First the schema and diff generation script which we’ll call script/gen_schema.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use Pod::Usage;
use Getopt::Long;
use MySocialApp::Schema;

my ( $preversion, $help );
GetOptions(
        'p|preversion:s'  => \$preversion,
        ) or die pod2usage;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );
my $version = $schema->schema_version();

if ($version && $preversion) {
    print "creating diff between version $version and $preversion\n";
} elsif ($version && !$preversion) {
    print "creating full dump for version $version\n";
} elsif (!$version) {
    print "creating unversioned full dump\n";
}

my $sql_dir = './sql';
$schema->create_ddl_dir( 'MySQL', $version, $sql_dir, $preversion );

This script will be run anytime we change something in the Result files. You give it the previous schema version, and it will create a diff between that and the new version. Before running this you’ll update the $VERSION variable in lib/MySocialApp/Schema.pm so that it knows a change has been made.

The next script is the upgrade script, we’ll call it script/upgrade_db.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use MySocialApp::Schema;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );

if (!$schema->get_db_version()) {
    # schema is unversioned
    $schema->deploy();
} else {
    $schema->upgrade();
}

This script checks to see if any diffs need to be applied, and applies them if the version held by the database and the version in your Schema.pm file differ, bringing the database up to the correct schema version.

Note, in these scripts I’ve hardcoded the DB info which really should go into a configuration file.

Create a Database to Deploy Into

We need to create the database that our tables will be created in. In the connect calls above we’re using this user and password to connect to our database. I’m using MySQL for the example so this would be done on the MySQL command prompt

mysql> create database mysocialapp;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on mysocialapp.* to mysocialapp@'localhost' identified by 'mysocialapp4u';
Query OK, 0 rows affected (0.01 sec)

Deploy the Initial Schema

Now its time to deploy our initial schema into MySQL. But for the first go round we also have to create the initial DDL file, this way when we make changes in the future it can be compared against the Schema result classes to see what changes have been made. We can do this by supplying a nonexistent previous version to our gen_schema.pl script

$ perl script/gen_schema.pl -p 0.00000
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.
creating diff between version 0.00001 and 0.00000
No previous schema file found (sql/MySocialApp-Schema-0.00000-MySQL.sql) at /home/jgoulah/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1685.

And we can see the DDL file now exists

$ ls sql/
MySocialApp-Schema-0.00001-MySQL.sql

Then we need to deploy to MySQL for the first time so we run the upgrade script

$ perl script/upgrade_db.pl
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.

Now check out what its created in MySQL

mysql> use mysocialapp;
Database changed
mysql> show tables;
+----------------------------+
| Tables_in_mysocialapp      |
+----------------------------+
| dbix_class_schema_versions |
| photo                      |
| user                       |
+----------------------------+
3 rows in set (0.00 sec)

There is our photo table, our user table, and also a dbix_class_schema_versions table. This last table just keeps track of what version the database is. You can see we are in sync with the Schema class by selecting from that table and also when this version was installed.

mysql> select * from dbix_class_schema_versions;
+---------+---------------------+
| version | installed           |
+---------+---------------------+
| 0.00001 | 2009-05-17 21:59:57 |
+---------+---------------------+
1 row in set (0.00 sec)

The really great thing is that we have created the tables -and- constraints based on our schema result classes. Check out our photo table

mysql> show create table photo\G
*************************** 1. row ***************************
       Table: photo
Create Table: CREATE TABLE `photo` (
  `photo_id` int(11) unsigned NOT NULL auto_increment,
  `url` varchar(255) NOT NULL,
  `fk_user_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`photo_id`),
  KEY `photo_idx_fk_user_id` (`fk_user_id`),
  CONSTRAINT `photo_fk_fk_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Making Database Changes

Lets say we want to add a password field to our user table. I’d open up the lib/MySocialApp/Schema/Result/User.pm file and add a section for my password field to the add_columns definition so now it looks like:

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    password => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

Then we update the lib/MySocialApp/Schema.pm file and update to the next version so it looks like

our $VERSION = '0.00002';

To create the DDL and Diff for this version we run the gen_schema script with the previous version as the argument

$ perl script/gen_schema.pl -p 0.00001
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.
creating diff between version 0.00002 and 0.00001

If you look in the sql directory there are two new files. The DDL is named MySocialApp-Schema-0.00002-MySQL.sql and the diff is called MySocialApp-Schema-0.00001-0.00002-MySQL.sql and has the alter statement

$ cat sql/MySocialApp-Schema-0.00001-0.00002-MySQL.sql
-- Convert schema 'sql/MySocialApp-Schema-0.00001-MySQL.sql' to 'MySocialApp::Schema v0.00002':;

BEGIN;

ALTER TABLE user ADD COLUMN password VARCHAR(255) NOT NULL;

COMMIT;

Now we can apply that with our upgrade script

$ perl script/upgrade_db.pl
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.

DB version (0.00001) is lower than the schema version (0.00002). Attempting upgrade.

and see that the upgrade has been made in MySQL

mysql> describe user;
+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| user_id  | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(255)     | NO   |     |         |                |
| password | varchar(255)     | NO   |     |         |                |
+----------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Conclusion

So now we’ve seen a really easy way that we can maintain our database schema from our ORM code. We have a versioned database that is under revision control, and can keep our stage and production environments in sync with our development code. With this type of setup its also easy to maintain branches with different database changes and merge them into the mainline, and it also ensures that the database you are developing on is always in sync with the code.

Tags: , , ,