May 01 2010

Quick Tip: Gist From the Command Line

Category: Organizationjgoulah @ 3:56 PM

Many a time while working it is convenient to quickly show someone else code or a chunk of output from some command. The easiest way to do this is through a pastebin service. It’s also pretty much mandated on IRC to use a service like this and there are literally thousands out there to choose from.

The gist service on github is actually fairly convenient especially if you have a github account since it keeps an archive of all of your previous gists. It also does a good job formatting and lets you paste privately. We’ll use the App::Nopaste tool to paste a gist straight from the command line.

First off install the tool from cpan

$ cpan App::Nopaste

You can use this tool anonymously but if you want to keep an archive of your pastes you can simply setup your git credentials in your .gitconfig file. The token here is your API token which can be found under your account settings page.

[github]
        user = jgoulah
        token = 00000000000000000000000000000000

You also need either Git installed or Config::INI::Reader to allow the module to read your .gitconfig file.

Now to make this a bit easier to remember we can create an alias in our .bashrc file. In this case I’m specifying –private so that only people that I give this secure URL out to can see, and I’m also specifying to use the Gist service. The nopaste app supports a variety of other services that you can use but as of this writing Gist is the only one that supports the –private flag.

alias gist='nopaste --private --service Gist'

Now you can use the command to paste something such as a script from the command line and the gist URL is returned

$ gist somescript.sh

http://gist.github.com/df552bacbbbb8d70c089

There you have it!

Tags: , , , , ,


Jan 18 2010

Using Mongo and Map Reduce on Apache Access Logs

Category: Databases,Systemsjgoulah @ 9:32 PM

Introduction

With more and more traffic pouring into websites, it has become necessary to come up with creative ways to parse and analyze large data sets. One of the popular ways to do that lately is using MapReduce which is a framework used across distributed systems to help make sense of large data sets. There are lots of implementations of the map/reduce framework but an easy way to get started is by using MongoDB. MongoDB is a scalable and high performant document oriented database. It has replication, sharding, and mapreduce all built in which makes it easy to scale horizontally.

For this article we’ll look a common use case of map/reduce which is to help analyze your apache logs. Since there is no set schema in a document oriented database, its a good fit for log files since its fairly easy to import arbitrary data. We’ll look at getting the data into a format that mongo can import, and writing a map/reduce algorithm to make sense of some of the data.

Getting Setup

Installing Mongo

Mongo is easy to install with detailed documentation here. In a nutshell you can do

$ mkdir -p /data/db
$ curl -O http://downloads.mongodb.org/osx/mongodb-osx-i386-latest.tgz
$ tar xzf mongodb-osx-i386-latest.tgz

At this point its not a bad idea to put this directory somewhere like /opt and adding its bin directory to your path. That way instead of

 ./mongodb-xxxxxxx/bin/mongod &

You can just do

mongodb &

In any case, start up the daemon one of those two ways depending how you set it up.

Importing the Log Files

Apache access files can vary in the information reported. The log format is easy to change with the LogFormat directive which is documented here. In any case these logs that I’m working with are not out of the box apache format. They look something like this

Jan 18 17:20:26 web4 logger: networks-www-v2 84.58.8.36 [18/Jan/2010:17:20:26 -0500] "GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1" 200 33445 "http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4" "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7" www.channelfrederator.com 35119

We want to take this raw log and convert it to a JSON structure for importing into mongo, which I wrote a simple perl script to iterate through the log and parse it into sensible fields using a regular expression

#!/usr/bin/perl

use strict;
use warnings;

my $logfile = "/logs/httpd/remote_www_access_log";
open(LOGFH, "$logfile");
foreach my $logentry (<LOGFH>) {
    chomp($logentry);   # remove the newline
    $logentry =~ m/(\w+\s\w+\s\w+:\w+:\w+)\s #date
                   (\w+)\slogger:\s # server host
                   ([^\s]+)\s # vhost logger
                   (?:unknown,\s)?(-|(?:\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},?\s?)*)\s #ip
                   \[(.*)\]\s # date again
                   \"(.*?)\"\s # request
                   (\d+)\s #status
                   ([\d-]+)\s # bytes sent
                   \"(.*?)\"\s # referrer 
                   \"(.*?)\"\s # user agent 
                   ([\w.-]+)\s? # domain name
                   (\d+)? # time to server (ms) 
                  /x;

    print <<JSON;
     {"date": "$5", "webserver": "$2", "logger": "$3", "ip": "$4", "request": "$6", "status": "$7", "bytes_sent": "$8", "referrer": "$9", "user_agent": "$10", "domain_name": "$11", "time_to_serve": "$12"} 
JSON

}

Again my regular expression probably won’t quite work on your logs, though you may be able to take bits and pieces of what I’ve documented above to use for your script. On my logs that script outputs a bunch of lines that look like this

{"date": "18/Jan/2010:17:20:26 -0500", "webserver": "web4", "logger": "networks-www-v2", "ip": "84.58.8.36", "request": "GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1", "status": "200", "bytes_sent": "33445", "referrer": "http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4", "user_agent": "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7", "domain_name": "www.channelfrederator.com", "time_to_serve": "35119"}

And we can then import that directly to MongoDB. The creates a collection called weblogs in the logs database, from the file that has the output of the above JSON generator script

$ mongoimport --type json -d logs -c weblogs --file weblogs.json

We can also take a look at them and verify they loaded by running the find command, which dumps out 10 rows by default

$ mongo
> use logs;
switched to db logs
> db.weblogs.find()

Setting Up the Map and Reduce Functions

So for this example what I am looking for is how many hits are going to each domain. My servers handle a bunch of different domain names and that is one thing I’m outputting into my logs that is easy to examine

The basic command line interface to MondoDB is a kind of javascript interpreter, and the mongo backend takes javascript implementations of the map and reduce functions. We can type these directly into the mongo console. The map function must emit a key/value pair and in this example it will output a ’1′ each time a domain is found

> map = "function() { emit(this.domain_name, {count: 1}); }"

And so basically what comes out of this is a key for each domain with a set of counts, something like

 {"www.something.com", [{count: 1}, {count: 1}, {count: 1}, {count: 1}]}

This is send to the reduce function, which sums up all those counts for each domain

> reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; };"

Now that you’ve setup map and reduce functions, you can call mapreduce on the collection

> results = db.weblogs.mapReduce(map, reduce)
...
>results
{
        "result" : "tmp.mr.mapreduce_1263861252_3",
        "timeMillis" : 9034,
        "counts" : {
                "input" : 1159355,
                "emit" : 1159355,
                "output" : 92
        },
        "ok" : 1,
}

This gives us a bit of information about the map/reduce operation itself. We see that we inputted a set of data and emmitted once per item in that set. And we reduced down to 92 domain with a count for each. A result collection is given, and we can print it out

> db.tmp.mr.mapreduce_1263861252_3.find()
> it

You can type the ‘it’ operator to page through multiple pages of data. Or you can print it all at once like so

> db.tmp.mr.mapreduce_1263861252_3.find().forEach( function(x) { print(tojson(x));});
{ "_id" : "barelydigital.com", "value" : { "count" : 342888 } }
{ "_id" : "barelypolitical.com", "value" : { "count" : 875217 } }
{ "_id" : "www.fastlanedaily.com", "value" : { "count" : 998360 } }
{ "_id" : "www.threadbanger.com", "value" : { "count" : 331937 } }

Nice, we have our data aggregated and the answer to our initial problem.

Automate With a Perl Script

As usual CPAN comes to the rescue with a MongoDB driver that we can use to interface with our database in a scripted fashion. The mongo guys have also done a great job of supporting it in a bunch of other languages which makes it easy to interface with if you are using more than just perl

#!/bin/perl

use MongoDB;
use Data::Dumper;
use strict;
use warnings;

my $conn = MongoDB::Connection->new("host" => "localhost", "port" => 27017);
my $db = $conn->get_database("logs");

my $map = "function() { emit(this.domain_name, {count: 1}); }";

my $reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; }";

my $idx = Tie::IxHash->new(mapreduce => 'weblogs', 'map' => $map, reduce => $reduce);
my $result = $db->run_command($idx);

print Dumper($result);

my $res_coll = $result->{'result'};

print "result collection is $res_coll\n";

my $collection = $db->get_collection($res_coll);
my $cursor = $collection->query({ }, { limit => 10 });

while (my $object = $cursor->next) {
    print Dumper($object); 
}

The script is straightforward. Its just using the documented perl interface to make a run_command call to mongo, which passes the map and reduce javascript functions in. It then prints the results similar to how we did on the command line earlier.

Summary

We’ve gone over a very simple example of how MapReduce can work for you. There are lots of other ways you can put it to good use such as distributed sorting and searching, document clustering, and machine learning. We also took a look at MongoDB which has great uses for schema-less data. It makes it easy to scale since it has built in replication and sharding capabilities. Now you can put map/reduce to work on your logs and find a ton of information you couldn’t easily get before.

References

http://github.com/mongodb/mongo
http://www.mongodb.org/display/DOCS/MapReduce
http://apirate.posterous.com/visualizing-log-files-with-mongodb-mapreduce
http://search.cpan.org/~kristina/MongoDB-0.27/
http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

Tags: , , , , , , , , ,


Sep 07 2009

Deploying A Catalyst App on Private Hosting

Category: Deploymentjgoulah @ 1:26 PM

Intro

In the last post I wrote about deploying Catalyst on shared hosting.  While shared hosting may seem attractive pricewise, you’ll quickly grow out of it, and it makes sense to move to hosting with some more control if you are serious about your website.   There are lots of choices when it comes to where you may want to host,  and if you are looking for virtual options, Slicehost, Linode, prgmr, or even Amazon EC2 are great choices.  Picking and setting these up are well beyond the scope of this article,  so I’m assuming you have a server with root privileges ready to go.

Setting Up Your Modules

As I probably sound like a broken record if you’ve read any of my other perl-related entries,  its a good idea to setup your modules using local::lib.  Check out the last article on how to set it up,  or the module docs do a fine job if you follow the bootstrap instructions.  I always create a new user to set this up as,  such as ‘perluser’  or similar,  this way your modules aren’t affected by upgrades as your own user.  We’ll show how to point to these modules shortly.

For now, you’ll want to also make sure  you have FCGI and FCGI::ProcManager installed as that user that you just setup along with the rest of your apps modules.

$ cpan FCGI
$ cpan FCGI::ProcManager

You may want to checkout your app under this user, so that you can just do

$ perl Makefile.PL
$ make installdeps

assuming you’ve kept your Makefile.PL updated, this should be all of the perl modules you need to run your application.

Setting Up Apache

You can setup apache using your package manager of choice.  For ease of this article I’ll assume you’re on Ubuntu, or Debian based system, and you can do something like

$ sudo apt-get install apache2.2-common
$ sudo apt-get install apache2-mpm-worker

And you also need the mod_fastcgi module.  You could download and compile it.  Or you could grab it from apt.   You’ll probably have to add the multiverse repository,  so update your /etc/apt/sources.listso that each line will end with

main restricted universe multiverse

Now you can

$ sudo apt-get update
$ sudo apt-get install libapache2-mod-fastcgi

Setup the VirtualHost and App Directory Structure

You need to put a VirtualHost so that Apache knows how to handle the requests to your domain

FastCgiExternalServer /opt/mysite.com/fake/prod -socket /opt/mysite.com/run/prod.socket

<VirtualHost *:80>
ServerName www.mysite.com

DocumentRoot /opt/mysite.com/app
Alias /static /opt/mysite.com/app/root/static/
Alias / /opt/mysite.com/fake/prod/
</VirtualHost>

This “/” alias ties your document root to the listening socket defined in the FastCgiExternalServer line. The “/static” alias make sure your static files are served by apache, instead of fastcgi.

This config also assumes some directory structure is setup,  which is really entirely up to you.  But here we’ll assume you have a directory located at /opt/mysite.com with a few directories under that called fake, run, and app.

$ sudo mkdir -p /opt/mysite.com/{fake,run,app}

The only directory you have to put anything in is app, which should contain your code.

Note, this is a very simplified layout.  In the real world I’d put the fake, run, and app dirs under a versioned directory,  which my active virtualhost would then point to.  I’ve talked briefly about this kind of deployment technique before at a high level and there is a great writeup here on the technical details on how to use multiple fastcgi servers to host your staging and production apps with seamless deployment.  Part of the beauty of using FastCGI is that you can run two copies of the app against the same socket so its easy to bring up instances pointing to different versions of your code, and deploy with zero downtime.

Launching FastCGI

The last piece of the puzzle is to have a launch script,  which makes sure that your app is listening on the socket.  So to keep it simple you would have a script called launch_mysite.sh that looks like this

#!/bin/sh

export PERL5OPT='-Mlib=/home/perl_module_user/perl5/lib/perl5'

/opt/mysite.com/app/script/mysite_fastcgi.pl \
-l /opt/mysite.com/run/prod.socket -e \
-p  /opt/mysite.com/run/prod.pid -n 15

The first line is telling the script to use the modules from the user we setup the local::lib to hold the modules, so make sure you change this to the correct location.  Then it starts up fastcgi to listen on your socket, and create a process pid,  and to spawn in this case 15 processes to handle requests. Go ahead and hit your domain, and it should show you your website.

Conclusion

We’ve gone over the basics on how to setup FastCGI using FastCgiExternalServer. You now have a lot of flexibility in how many processes are handling requests, the ability to run different copies of your app and flipping the switch, and pointing to which modules are run with your app. There are a lot of improvements you can now make to setup a very sane deployment process so that each version of code deployed can be its own standalone build and ensuring your production app has 100% uptime, but at this point its up to your imagination.

Tags: , , , , ,


Aug 29 2009

Deploying a Catalyst App on Shared Hosting

Category: Deploymentjgoulah @ 1:48 PM

Intro

People have long complained that one of the tricky things about perl is the deployment phase, much because of the intricacies of mod_perl and its unfriendliness towards shared environments. In fact, I would highly recommend FastCGI over mod_perl since it is quite easy to understand and configure. This post is going to focus on smaller shared hosting environments, and how easy it can be to quickly deploy a Catalyst web app.

Getting Started

Assumptions

There are some basic assumptions here. First we need a Linux webserver that has Apache installed and is loading up
fcgid. I believe you can also use the favored mod_fastcgi which I just pointed to above, but I have yet to test this on a shared host. These are binary compatible modules so in theory both work. But again I’ve only used mod_fastcgi for large non-shared hosted deployments. You’ll also need mod_rewrite which is now fairly common.

Installing Your Modules

The best way to install the modules your application depends on is using local::lib. I’ve talked about this before so there isn’t a lot of need to go over the process in detail again, but in a nutshell you can do

$ wget http://search.cpan.org/CPAN/authors/id/A/AP/APEIRON/local-lib-1.004006.tar.gz
$ tar xzvf local-lib-1.004006.tar.gz
$ cd local-lib-1.004006
$ perl Makefile.PL --bootstrap
$ make test && make install
$ echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >>~/.bashrc
$ source ~/.bashrc

Now you have an environment that you can install your modules into. By default this is localized to ~/perl5. The next step is to install your modules that the application requires. It is good practice to put these into your Makefile.PL so that you can easily install them in one shot. A very basic one would follow this template

use inc::Module::Install;

name 'MyApp';
all_from 'lib/MyApp.pm';

requires 'Catalyst::Runtime' => '5.80011';
requires 'Config::General';
# require other modules here

install_script glob('script/*.pl');
auto_install;
WriteAll;

Now its easy to do

$ export PERL_MM_USE_DEFAULT=1
$ perl Makefile.PL
$ make installdeps

The PERL_MM_USE_DEFAULT will configure things such that you don’t have to press enter at every question about a dependency. The make installdeps will install any missing modules, which in this case is going to be everything. You can upgrade the version numbers in the Makefile.PL “requires” lines if you want installdeps to grab the newer distributions as they are released to CPAN.

Configuring Your App

First thing we have to do is a minor edit to our fastcgi script, which is to tell it to use our local::lib. Since its not part of the environment we setup earlier in .bashrc we have to tell the fastcgi perl script where to find things. Below the “use warnings;” line add this

use lib "/home/myuser/perl5/lib/perl5";
use local::lib;

Make sure to change the path to the correct location of your perl5 modules directory.

The last thing is to make sure your app is located in the public directory root for your host. In my case I created a symbolic link from the public_html folder to my app.

$ cd && mv public_html public_html.old  # get rid of current root folder
$ ln -s ~/myapp ~/public_html

Then create a .htaccess file in that folder, which should reside beside all of your code

Options ExecCGI FollowSymLinks
Order allow,deny
Allow from all
AddHandler fcgid-script .pl

RewriteEngine On
RewriteCond %{REQUEST_URI} !^/?script/myapp_fastcgi.pl
RewriteRule ^(.*)$ script/myapp_fastcgi.pl/$1 [PT,L]
RewriteRule ^static/(.*)$ static/$1 [PT,L]

We’re just telling apache to turn CGI on, and make sure to execute our fastcgi perl script. Be sure to change the rewrite lines to point to your script (hint: change myapp to your app name).

Now, you should be able to hit your domain. Simple!

Conclusion

So we’ve seen that deploying perl can actually be fairly easy. There are of course some assumptions here, for example, to get the rewrite rules working you’ll need mod_rewrite.so, but this is fairly standard these days. Now you can deploy a perl app with the same ease as languages such as PHP, where it is pretty much plug and play. This should enable people to more easily compete with all the badly written blog, forum, and other generically useful software in the open source world.

Tags: , , ,


May 17 2009

Using DBIx::Class to Version Your Schema

Category: Databases,Deployment,Schema Versioningjgoulah @ 5:53 PM

Intro to DBIx::Class

In my opinion DBIx::Class is one of the best ORM solutions out there. Not only can it model your database, including mapping out any foreign key relationships, it can also be used as a canonical point of reference for your schema.   This means we can use it not only as an application layer interface to the database, but can also define a versioned database structure using the code,  where if you add a field into your result class,  you can generate a version of alter statements and a DDL just by running a simple script.   Of course this set of alterations can later be applied to your stage and production environments.

It’s possible to produce your schema classes with DBIx::Class::Schema::Loader if you have a pre-existing database and are initially generating your DBIC classes, but this article is going to show going from the ground up how to build your database from the code defining the fields and tables as well as the relations, which match to foreign key constraints when the database definition files are generated.

Creating your Schema

This example will be a pretty simple schema but enough to demonstrate generating a couple of tables with a foreign key constraint. Since a lot of people are doing social networking stuff these days we can assume a common use case would be a user who has a bunch of photos. So we’ll need to model a user table and a photo table, but first things first, lets create a few basic things, assuming we’ll call our application MySocialApp

mkdir -p MySocialApp/lib/MySocialApp/Schema/Result
cd MySocialApp

We need to create a schema file at lib/MySocialApp/Schema.pm that inherits from the base DBIx::Class::Schema

package MySocialApp::Schema;
use base qw/DBIx::Class::Schema/;

use strict;
use warnings;
our $VERSION = '0.00001';

__PACKAGE__->load_namespaces();

__PACKAGE__->load_components(qw/Schema::Versioned/);

__PACKAGE__->upgrade_directory('sql/');

1;

So here all we are doing is extending the base Schema and loading the Versioned component. We’re setting the directory where DDL and Diff files will be generated into the sql directory. We’re also invoking load_namespaces which tells it to look at the MySocialApp::Schema::Result namespace for our result classes by default.

Creating the Result Classes

Here I will define what the database is going to look like

package MySocialApp::Schema::Result::User;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('user');

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

__PACKAGE__->set_primary_key('user_id');

__PACKAGE__->has_many(
    'photos',
    'MySocialApp::Schema::Result::Photo',
    { "foreign.fk_user_id" => "self.user_id" },
);

1;

This is relatively straightforward. I’m creating a user table that has an auto incrementing primary key and a username field. I’m also defining a relationship that says a user can have many photos. Lets create the photo class.

package MySocialApp::Schema::Result::Photo;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('photo');

 __PACKAGE__->add_columns(

    photo_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    url => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    fk_user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        extra => { unsigned => 1 },
    },
);

__PACKAGE__->set_primary_key('photo_id');

 __PACKAGE__->belongs_to(
    'user',
    'MySocialApp::Schema::Result::User',
    { 'foreign.user_id' => 'self.fk_user_id' },
);


1;

Same basic thing here, a photo class with a photo_id primary key, a url field, and an fk_user_id field that keys into the user table. Each photo belongs to a user, and this relationship will define our foreign key constraint when the schema is generated.

Versioning the Database

Create the Versioning Scripts

We have the main DBIx::Class pieces in place to generate the database, but we’ll need a couple of scripts to support our versioned database. One script will generate the schema based on the version before it, introspecting which alterations have been made and producing a SQL diff file to alter the database. The other script will look at the database to see if it needs upgrading, and run the appropriate diff files to bring it up to the current version.

First the schema and diff generation script which we’ll call script/gen_schema.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use Pod::Usage;
use Getopt::Long;
use MySocialApp::Schema;

my ( $preversion, $help );
GetOptions(
        'p|preversion:s'  => \$preversion,
        ) or die pod2usage;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );
my $version = $schema->schema_version();

if ($version && $preversion) {
    print "creating diff between version $version and $preversion\n";
} elsif ($version && !$preversion) {
    print "creating full dump for version $version\n";
} elsif (!$version) {
    print "creating unversioned full dump\n";
}

my $sql_dir = './sql';
$schema->create_ddl_dir( 'MySQL', $version, $sql_dir, $preversion );

This script will be run anytime we change something in the Result files. You give it the previous schema version, and it will create a diff between that and the new version. Before running this you’ll update the $VERSION variable in lib/MySocialApp/Schema.pm so that it knows a change has been made.

The next script is the upgrade script, we’ll call it script/upgrade_db.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use MySocialApp::Schema;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );

if (!$schema->get_db_version()) {
    # schema is unversioned
    $schema->deploy();
} else {
    $schema->upgrade();
}

This script checks to see if any diffs need to be applied, and applies them if the version held by the database and the version in your Schema.pm file differ, bringing the database up to the correct schema version.

Note, in these scripts I’ve hardcoded the DB info which really should go into a configuration file.

Create a Database to Deploy Into

We need to create the database that our tables will be created in. In the connect calls above we’re using this user and password to connect to our database. I’m using MySQL for the example so this would be done on the MySQL command prompt

mysql> create database mysocialapp;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on mysocialapp.* to mysocialapp@'localhost' identified by 'mysocialapp4u';
Query OK, 0 rows affected (0.01 sec)

Deploy the Initial Schema

Now its time to deploy our initial schema into MySQL. But for the first go round we also have to create the initial DDL file, this way when we make changes in the future it can be compared against the Schema result classes to see what changes have been made. We can do this by supplying a nonexistent previous version to our gen_schema.pl script

$ perl script/gen_schema.pl -p 0.00000
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.
creating diff between version 0.00001 and 0.00000
No previous schema file found (sql/MySocialApp-Schema-0.00000-MySQL.sql) at /home/jgoulah/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1685.

And we can see the DDL file now exists

$ ls sql/
MySocialApp-Schema-0.00001-MySQL.sql

Then we need to deploy to MySQL for the first time so we run the upgrade script

$ perl script/upgrade_db.pl
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.

Now check out what its created in MySQL

mysql> use mysocialapp;
Database changed
mysql> show tables;
+----------------------------+
| Tables_in_mysocialapp      |
+----------------------------+
| dbix_class_schema_versions |
| photo                      |
| user                       |
+----------------------------+
3 rows in set (0.00 sec)

There is our photo table, our user table, and also a dbix_class_schema_versions table. This last table just keeps track of what version the database is. You can see we are in sync with the Schema class by selecting from that table and also when this version was installed.

mysql> select * from dbix_class_schema_versions;
+---------+---------------------+
| version | installed           |
+---------+---------------------+
| 0.00001 | 2009-05-17 21:59:57 |
+---------+---------------------+
1 row in set (0.00 sec)

The really great thing is that we have created the tables -and- constraints based on our schema result classes. Check out our photo table

mysql> show create table photo\G
*************************** 1. row ***************************
       Table: photo
Create Table: CREATE TABLE `photo` (
  `photo_id` int(11) unsigned NOT NULL auto_increment,
  `url` varchar(255) NOT NULL,
  `fk_user_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`photo_id`),
  KEY `photo_idx_fk_user_id` (`fk_user_id`),
  CONSTRAINT `photo_fk_fk_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Making Database Changes

Lets say we want to add a password field to our user table. I’d open up the lib/MySocialApp/Schema/Result/User.pm file and add a section for my password field to the add_columns definition so now it looks like:

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    password => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

Then we update the lib/MySocialApp/Schema.pm file and update to the next version so it looks like

our $VERSION = '0.00002';

To create the DDL and Diff for this version we run the gen_schema script with the previous version as the argument

$ perl script/gen_schema.pl -p 0.00001
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.
creating diff between version 0.00002 and 0.00001

If you look in the sql directory there are two new files. The DDL is named MySocialApp-Schema-0.00002-MySQL.sql and the diff is called MySocialApp-Schema-0.00001-0.00002-MySQL.sql and has the alter statement

$ cat sql/MySocialApp-Schema-0.00001-0.00002-MySQL.sql
-- Convert schema 'sql/MySocialApp-Schema-0.00001-MySQL.sql' to 'MySocialApp::Schema v0.00002':;

BEGIN;

ALTER TABLE user ADD COLUMN password VARCHAR(255) NOT NULL;

COMMIT;

Now we can apply that with our upgrade script

$ perl script/upgrade_db.pl
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.

DB version (0.00001) is lower than the schema version (0.00002). Attempting upgrade.

and see that the upgrade has been made in MySQL

mysql> describe user;
+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| user_id  | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(255)     | NO   |     |         |                |
| password | varchar(255)     | NO   |     |         |                |
+----------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Conclusion

So now we’ve seen a really easy way that we can maintain our database schema from our ORM code. We have a versioned database that is under revision control, and can keep our stage and production environments in sync with our development code. With this type of setup its also easy to maintain branches with different database changes and merge them into the mainline, and it also ensures that the database you are developing on is always in sync with the code.

Tags: , , ,


Apr 19 2009

Make Browsing Code Easier with Ack and Ctags

Category: Organizationjgoulah @ 8:44 PM

Intro

When working with open source software, its essential to know how to navigate large code bases, perhaps unfamiliar and quite large. There are a few tools I use to do this that should be part of any developers arsenal, and they are: ack and ctags.

Ack can be thought of as a faster and more powerful grep. It searches recursively by default, ignores binary files and most version control files (think .svn), lets you specify file types on search, use perl regular expressions, and has easier to read output than grep.

Ctags is a tool that many are familiar with, and there are tons of articles about it already.  But I bring it up so I can show some quick starter scripts that you’d use to generate the tags for PHP or Perl scripts.  I’ll show a quick C++ and Java example too, since I use those from time to time.

Ctags

Installation

There’s really not much to installing ctags.  You could download and compile the source, but just get it from your package management system.

If you’re using Debian or Ubuntu you can do:

sudo apt-get isntall exuberant-ctags

Similarly in CentOS and Redhat based distros:

sudo yum install ctags

Usage

Ctags basically indexes your code and creates a tag file that can then be used in your editor to literally jump around your code.  If you see a method call as you’re browsing code, you can jump to the definition of that method with one keystroke, and back to where you were.  Same thing for variables.  In a keystroke you can jump to see where its defined.   As you jump through code a stack is created,  and as you jump back you are just popping off that stack, also known as LIFO, by the way.

Generating the Tag Files

I have a few scripts to generate the ctags files depending on different codebases. I tend to use VI so I’m going to cover how to do it with that editor, but you can also use emacs.

For these examples I’m going to send the output to a file in my ~/.vim/tags/ directory, which I’ll later add to .vimrc. You could expand these scripts to fit your needs. For these examples they are pretty basic and hardcoded.

PHP

$ cat bin/ctags_php
#!/bin/bash
cd ~/mysandbox/myphpproject
ctags -f ~/.vim/tags/myphpproject \
--langmap="php:+.inc" -h ".php.inc" -R \
--exclude='*.js' \
--exclude='*.sql' \
--totals=yes \
--tag-relative=yes \
--PHP-kinds=+cf-v \
--regex-PHP='/abstract\s+class\s+([^ ]+)/\1/c/' \
--regex-PHP='/interface\s+([^ ]+)/\1/c/' \
--regex-PHP='/(public\s+|static\s+|abstract\s+|protected\s+|private\s+)function\s+\&?\s*([^ (]+)/\2/f/'

and when you run it, you’ll see something like:

$ ctags_php
498 files, 66678 lines (2624 kB) scanned in 0.6 seconds (4604 kB/s)
1643 tags added to tag file
1643 tags sorted in 0.00 seconds

Perl

In Perl you can do things a bit smarter since you should have a Makefile.PL script to keep track of your dependencies. If so you can add:

postamble(
q!
tags:
ctags -f ~/.vim/tags/myperlcode --recurse --totals \
--exclude=blib \
--exclude='*~' \
--languages=Perl --langmap=Perl:+.t \
!);

Then you should do:

perl Makefile.PL
make tags

C++

You can do a very similar thing for C++ code

$ cd /path/to/code
$ ctags -f ~/.vim/tags/myc++code --tag-relative=yes --recurse --language-force=c++ *

Java

Or say we want to tag the entire java library itself

$ ctags -f ~/.vim/tags/java -R --language-force=java /opt/java/src

Letting VI know about your files

After creating one or more tagfiles you should edit your ~/.vimrc file and add the location to your tag files and separate the entries by commas or spaces

set tags=~/.vim/tags/myphpproject,~/.vim/tags/myperlcode

Navigating Around the Code

In VI there are two easy commands to jump around.

To move to the definition of a method/variable, place the cursor over it and press

Ctrl + ]

And to jump back

Ctrl + t

If you try to jump to something and it isn’t found, its probably something in a library you’re using, so you’ll have to grab the source and tag those too.

Ack

Installation

There are a few ways to install ack listed on the ack homepage.   If you are familiar with CPAN you can install App::Ack, or if you want to use package management you can grab ack-grep on Ubuntu or ack on Redhat based distros.

Usage

The best thing I can really tell you is to read the ack help

$ ack --help

Ack takes a regular expression as the first argument and a directory to search as the second. Typically you want to search all (-a) files or in a case insensitive fashion (-i)

$ ack -ai 'searchstring' .

Or you can search specific file types

$ ack --perl  searchterm

And one really cool thing is though ack gives nice colorized output in a structured fashion, if you pipe it to another process it outputs like grep by default so that you can continue to pipe it

For example lets say I want to find the modules MooseX::Types is using

ack -a '^use.*;' ~/perl5/lib/perl5/MooseX/Types/

It gives something looking like (output truncated)

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Util.pm
9:use warnings;
10:use strict;
12:use base 'Exporter';

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Moose.pm
9:use warnings;
10:use strict;
12:use MooseX::Types;
13:use Moose::Util::TypeConstraints ();
15:use namespace::clean -except => [qw( meta )];

If you pipe the output it looks more like grep output

ack -a '^use.*;' ~/perl5/lib/perl5/MooseX/Types/ | cat

Again the output is truncated but looks like

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:11:use MooseX::Types::Util             qw( filter_tags );
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:12:use Sub::Exporter                   qw( build_exporter );
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:13:use Moose::Util::TypeConstraints;
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:15:use namespace::clean -except => [qw( meta )];

Lets say for example reasons I wanted to find all the modules used in the code and make sure they are installed (using the ‘make install’ command in the module would be the more correct and easier way), you could do

$ ack -ah '^use\s[^\d].*;' ~/perl5/lib/perl5/MooseX/Types/ | \
  ack -v 'warnings|strict|base' | \
  perl -ne "m|use ((\w+:?:?)+)(.*)(;)|; print qq{\$1\n};" | \
  sort -u | xargs cpan

which cleans the output and gives the module list to cpan for installation and it lets me know I have everything installed and up to date

Carp::Clan is up to date (6.00).
Class::MOP is up to date (0.81).
Devel::PartialDump is up to date (0.07).
Moose is up to date (0.74).
Moose::Meta::TypeConstraint::Union is up to date (0.74).
Moose::Util::TypeConstraints is up to date (0.74).
MooseX::Meta::TypeConstraint::Structured is up to date (undef).
MooseX::Types is up to date (0.10).
MooseX::Types::Util is up to date (undef).
namespace::clean is up to date (0.11).
Scalar::Util is up to date (1.19).
Sub::Exporter is up to date (0.982).

Conclusion

These are pretty commonplace but great tools to know if you don’t already. Try to integrate them into your work flow and I think you’ll notice that it will speed you up quite a bit, especially when you are browsing through unfamiliar territory.

Tags: , , , , ,


Jan 13 2009

Using SQLT to Create Database Diffs

Category: Schema Versioningjgoulah @ 11:19 PM

Overview

Database diffs can be useful for a variety of use cases, from outputting the differences in your development versus production schema to full blown database versioning systems.   A database diff is similar in nature to a file diff, but instead of outputting the differences between two files,  it will output the set of alterations to be made to bring one databases schema in sync with another.   There are a variety of tools that will accomplish this for you, but I am going to use a tool called SQL::Translator.   SQLT can do quite a lot for you, such as converting among different schema dialects (think MySQL to Oracle) or outputting an ERD of your database, but I’ll stick with using it as a simple diff tool for this article.

Installation

The tool can be found here on CPAN.   I suggest using local::lib for perl module installation to avoid needing to install things as root.  I’ve written up detailed instructions in the past, but its pretty straightforward if you follow the bootstrap instructions.  In any case, if you are lucky you will be able to just do a normal cpan install like so:

cpan SQL::Translator

For our examples you’ll also need the DBD::MySQL driver. This can be a real pain to install from CPAN so you may want to try to use your systems package management for this. Under Debian this is called libdbd-mysql-perl and in Redhat its called perl-DBD-MySQL.

Diffing Two DDL Files

Once you have SQLT installed, you’ll want to grab the DDL files from the databases that you are interested in producing diffs for.  If you are using mysql this is easy enough to do with mysqldump.

So lets say you want to compare your development and production databases, we’d grab those schema dumps, assuming your dev db is on host ‘dev’ and prod db on host ‘prod’:

mysqldump --no-data -h dev -u your_dbuser -p your_dbname > dev_db.sql
mysqldump --no-data -h prod -u your_dbuser -p your_dbname > prod_db.sql

The SQLT install comes with a handy script called sqlt-diff which you can use to diff two different DDL files. So in the case of MySQL all you have to run is:

sqlt-diff dev_db.sql=MySQL prod_db.sql=MySQL > my_diff.sql

You should be able to pass other arguments where we’ve used ‘MySQL’ here, and you can get a list via:

sqlt -l

Diffing DDL Against a Database

The above may be all you need, but perhaps you want to diff a DDL against the actual database for whatever reason. I recently used this approach to write a versioning system in which the database changes could be output into a file containing the alterations as well as a new DDL for each “version”, this way changes are not lost or forgotten.

SQLT doesn’t currently have any scripts to handle this (yet) so we’ll have to use the perl module itself to do what we want. A simple perl script is all we need.

#!/usr/bin/perl

use strict;
use warnings;
use SQL::Translator;
use SQL::Translator::Diff;
use DBI;

# set the path of the file you want to diff against the DB
$ddl_file = "/path/to/file_to_diff.sql";

# make sure to fill in db_name, host_name, and db_port
my $dsn = "DBI:mysql:database=db_name;host=host_name;port=db_port";

# you'll have to set the db_user and db_pass here
DBI->connect($dsn, 'db_user', 'db_pass', {'RaiseError' => 1});

# create a SQLT object that connect to the database via our dbh
my $db_tr = SQL::Translator->new({
      add_drop_table => 1,
      parser => 'DBI',
      parser_args => {  dbh => $dbh }
    });
$db_tr->parser('SQL::Translator::Parser::DBI::MySQL');


$db_tr->parse($db_tr, $dbh);

# create a SQLT object that corresponds to our ddl file
my $file_tr = SQL::Translator->new({
    parser => 'MySQL'
  });
my $out = $file_tr->translate( $ddl_file ) or die $file_tr->error;

# perform the actual diff (some of the options you use may vary)
my $diff = SQL::Translator::Diff::schema_diff(
    $file_tr->schema,  'MySQL',
    $db_tr->schema,  'MySQL',
    {
    ignore_constraint_names => 1,
    ignore_index_names => 1,
    caseopt => 1
    }
);

# now just print out the diff
my $file;
my $filename = "diff_output.sql"
if(!open($file, ">$filename")) {
  die("Can't open $filename for writing ($!)");
  next;
}
print $file $diff;

So we’ve used the SQL Translator tool to read in our database, read in our DDL, and produce a diff. Pretty neat!

Conclusion

We’ve really only scratched the surface with some of the capabilities of SQL::Translator, but its a pretty handy thing to be able to easily diff various schemas. Its important that your development and production schema are in sync, and stay in sync, and this gives an easy way to start on automating that process.

Tags: , , , , , , ,