Feb 06 2010

Improve Git Emails

Category: Uncategorized, Version Controljgoulah @ 3:02 PM

This is just a quick tip on how to hack gitosis to allow you to have better subject lines in your emails. This assumes you are using gitosis, not gitolite which many people hosting their own git repositories are moving to because it supports per branch and granulated user permissions. If you are looking on a full article on how to install gitosis check out this post.

So grab gitosis

 git clone git://eagain.net/gitosis

Apply this patch to set an environment variable for the user that is doing a push

diff --git a/gitosis/serve.py b/gitosis/serve.py
index d83b1d8..a6a49a2 100644
--- a/gitosis/serve.py
+++ b/gitosis/serve.py
@@ -200,6 +200,7 @@ class Main(app.App):
             sys.exit(1)

         main_log.debug('Serving %s', newcmd)
+        os.putenv('USER', user)
         os.execvp('git', ['git', 'shell', '-c', newcmd])
         main_log.error('Cannot execute git-shell.')
         sys.exit(1)

Then install gitosis with that change

$ sudo ./setup.py install

Now, edit the /usr/local/bin/git-post-receive-email script to change the subject line to make more sense

--- /usr/local/bin/git-post-receive-email.old	2010-02-06 19:29:35.000000000 +0000
+++ /usr/local/bin/git-post-receive-email	2010-02-06 19:14:03.000000000 +0000
@@ -186,7 +186,7 @@
 	# Generate header
 	cat <<-EOF
 	To: $recipients
-	Subject: ${emailprefix} $refname_type, $short_refname, ${change_type}d. $describe
+	Subject: ${emailprefix} push from ${USER}- $refname_type, $short_refname, ${change_type}d.
 	X-Git-Refname: $refname
 	X-Git-Reftype: $refname_type
 	X-Git-Oldrev: $oldrev

If you haven’t setup email already make sure that you have the hook installed by making a symbolic link in /home/git/repositories/myrepo.git/hooks

ln -s /usr/local/bin/git-post-receive-email post-receive

And now your subject changes from this

[GIT] branch, master, updated. d61c9b93446a13093ef7f510a588c69a606ff762

to this

[GIT] push from jgoulah- branch, master, updated.

Now we know who is pushing code by glancing at the subject line. The username comes from the filenames in gitosis-admin/keydir/. Obviously you could style the subject line however you see fit.

Tags: ,


Jan 18 2010

Using Mongo and Map Reduce on Apache Access Logs

Category: Databases, Systemsjgoulah @ 9:32 PM

Introduction

With more and more traffic pouring into websites, it has become necessary to come up with creative ways to parse and analyze large data sets. One of the popular ways to do that lately is using MapReduce which is a framework used across distributed systems to help make sense of large data sets. There are lots of implementations of the map/reduce framework but an easy way to get started is by using MongoDB. MongoDB is a scalable and high performant document oriented database. It has replication, sharding, and mapreduce all built in which makes it easy to scale horizontally.

For this article we’ll look a common use case of map/reduce which is to help analyze your apache logs. Since there is no set schema in a document oriented database, its a good fit for log files since its fairly easy to import arbitrary data. We’ll look at getting the data into a format that mongo can import, and writing a map/reduce algorithm to make sense of some of the data.

Getting Setup

Installing Mongo

Mongo is easy to install with detailed documentation here. In a nutshell you can do

$ mkdir -p /data/db
$ curl -O http://downloads.mongodb.org/osx/mongodb-osx-i386-latest.tgz
$ tar xzf mongodb-osx-i386-latest.tgz

At this point its not a bad idea to put this directory somewhere like /opt and adding its bin directory to your path. That way instead of

 ./mongodb-xxxxxxx/bin/mongod &

You can just do

mongodb &

In any case, start up the daemon one of those two ways depending how you set it up.

Importing the Log Files

Apache access files can vary in the information reported. The log format is easy to change with the LogFormat directive which is documented here. In any case these logs that I’m working with are not out of the box apache format. They look something like this

Jan 18 17:20:26 web4 logger: networks-www-v2 84.58.8.36 [18/Jan/2010:17:20:26 -0500] “GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1″ 200 33445 “http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4″ “Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7″ www.channelfrederator.com 35119

We want to take this raw log and convert it to a JSON structure for importing into mongo, which I wrote a simple perl script to iterate through the log and parse it into sensible fields using a regular expression

#!/usr/bin/perl

use strict;
use warnings;

my $logfile = "/logs/httpd/remote_www_access_log";
open(LOGFH, "$logfile");
foreach my $logentry (<LOGFH>) {
    chomp($logentry);   # remove the newline
    $logentry =~ m/(\w+\s\w+\s\w+:\w+:\w+)\s #date
                   (\w+)\slogger:\s # server host
                   ([^\s]+)\s # vhost logger
                   (?:unknown,\s)?(-|(?:\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},?\s?)*)\s #ip
                   \[(.*)\]\s # date again
                   \"(.*?)\"\s # request
                   (\d+)\s #status
                   ([\d-]+)\s # bytes sent
                   \"(.*?)\"\s # referrer
                   \"(.*?)\"\s # user agent
                   ([\w.-]+)\s? # domain name
                   (\d+)? # time to server (ms)
                  /x;

    print <<JSON;
     {"date": "$5", "webserver": "$2", "logger": "$3", "ip": "$4", "request": "$6", "status": "$7", "bytes_sent": "$8", "referrer": "$9", "user_agent": "$10", "domain_name": "$11", "time_to_serve": "$12"}
JSON

}

Again my regular expression probably won’t quite work on your logs, though you may be able to take bits and pieces of what I’ve documented above to use for your script. On my logs that script outputs a bunch of lines that look like this

{"date": "18/Jan/2010:17:20:26 -0500", "webserver": "web4", "logger": "networks-www-v2", "ip": "84.58.8.36", "request": "GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1", "status": "200", "bytes_sent": "33445", "referrer": "http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4", "user_agent": "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7", "domain_name": "www.channelfrederator.com", "time_to_serve": "35119"}

And we can then import that directly to MongoDB. The creates a collection called weblogs in the logs database, from the file that has the output of the above JSON generator script

$ mongoimport --type json -d logs -c weblogs --file weblogs.json

We can also take a look at them and verify they loaded by running the find command, which dumps out 10 rows by default

$ mongo
> use logs;
switched to db logs
> db.weblogs.find()

Setting Up the Map and Reduce Functions

So for this example what I am looking for is how many hits are going to each domain. My servers handle a bunch of different domain names and that is one thing I’m outputting into my logs that is easy to examine

The basic command line interface to MondoDB is a kind of javascript interpreter, and the mongo backend takes javascript implementations of the map and reduce functions. We can type these directly into the mongo console. The map function must emit a key/value pair and in this example it will output a ‘1′ each time a domain is found

> map = "function() { emit(this.domain_name, {count: 1}); }"

And so basically what comes out of this is a key for each domain with a set of counts, something like

 {"www.something.com", [{count: 1}, {count: 1}, {count: 1}, {count: 1}]}

This is send to the reduce function, which sums up all those counts for each domain

> reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; };"

Now that you’ve setup map and reduce functions, you can call mapreduce on the collection

> results = db.weblogs.mapReduce(map, reduce)
...
>results
{
        "result" : "tmp.mr.mapreduce_1263861252_3",
        "timeMillis" : 9034,
        "counts" : {
                "input" : 1159355,
                "emit" : 1159355,
                "output" : 92
        },
        "ok" : 1,
}

This gives us a bit of information about the map/reduce operation itself. We see that we inputted a set of data and emmitted once per item in that set. And we reduced down to 92 domain with a count for each. A result collection is given, and we can print it out

> db.tmp.mr.mapreduce_1263861252_3.find()
> it

You can type the ‘it’ operator to page through multiple pages of data. Or you can print it all at once like so

> db.tmp.mr.mapreduce_1263861252_3.find().forEach( function(x) { print(tojson(x));});
{ "_id" : "barelydigital.com", "value" : { "count" : 342888 } }
{ "_id" : "barelypolitical.com", "value" : { "count" : 875217 } }
{ "_id" : "www.fastlanedaily.com", "value" : { "count" : 998360 } }
{ "_id" : "www.threadbanger.com", "value" : { "count" : 331937 } }

Nice, we have our data aggregated and the answer to our initial problem.

Automate With a Perl Script

As usual CPAN comes to the rescue with a MongoDB driver that we can use to interface with our database in a scripted fashion. The mongo guys have also done a great job of supporting it in a bunch of other languages which makes it easy to interface with if you are using more than just perl

#!/bin/perl

use MongoDB;
use Data::Dumper;
use strict;
use warnings;

my $conn = MongoDB::Connection->new("host" => "localhost", "port" => 27017);
my $db = $conn->get_database("logs");

my $map = "function() { emit(this.domain_name, {count: 1}); }";

my $reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; }";

my $idx = Tie::IxHash->new(mapreduce => 'weblogs', 'map' => $map, reduce => $reduce);
my $result = $db->run_command($idx);

print Dumper($result);

my $res_coll = $result->{'result'};

print "result collection is $res_coll\n";

my $collection = $db->get_collection($res_coll);
my $cursor = $collection->query({ }, { limit => 10 });

while (my $object = $cursor->next) {
    print Dumper($object);
}

The script is straightforward. Its just using the documented perl interface to make a run_command call to mongo, which passes the map and reduce javascript functions in. It then prints the results similar to how we did on the command line earlier.

Summary

We’ve gone over a very simple example of how MapReduce can work for you. There are lots of other ways you can put it to good use such as distributed sorting and searching, document clustering, and machine learning. We also took a look at MongoDB which has great uses for schema-less data. It makes it easy to scale since it has built in replication and sharding capabilities. Now you can put map/reduce to work on your logs and find a ton of information you couldn’t easily get before.

References

http://github.com/mongodb/mongo
http://www.mongodb.org/display/DOCS/MapReduce
http://apirate.posterous.com/visualizing-log-files-with-mongodb-mapreduce
http://search.cpan.org/~kristina/MongoDB-0.27/
http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

Tags: , , , , , , , , ,


Dec 13 2009

Using Amazon Relational Database Service

Category: Cloud Computingjgoulah @ 2:09 PM

Intro

Amazon recently released a new service that makes it easier set up, operate, and scale a relational database in the cloud called Amazon Relational Database Service. The service, based on MySQL for now, has its pluses and minuses and you should decide whether it fits your needs. The advantages are that it has an automated backup system that lets you restore to any point within the last 5 minutes and also allows you to easily take a snapshot at any time. It is also very easy to “scale up” your box. This is more in the realm of vertical scaling but if you find you are hitting limits you can upgrade to a more powerful server with little to no effort. It also gives you monitoring via Amazon Cloudwatch and automatically patches your database during your self defined maintenance windows. The downfalls are that you don’t have access directly to the box itself, so you can’t ssh in. You also at this point cannot use replication for a master-slave style setup. Amazon promises to have more high availability options forthcoming. Since you can’t ssh in, you adjust mysql parameters via their db parameter group API. I’ll go over an example of this.

Creating a Database Instance

The first thing to do is install the RDS command line API, which you can grab from here. I’m not going over the details of setting this up. Its basically as simple as putting it into your path and Amazon has plenty of documentation on this.

Once you have the command line tools setup you can create a database like so

rds-create-db-instance \
        mydb-instance \
        --allocated-storage 20 \
        --db-instance-class db.m1.small \
        --engine MySQL5.1  \
        --master-username masteruser \
        --master-user-password mypass \
        --db-name mydb --headers \
        --preferred-maintenance-window 'Mon:06:15-Mon:10:15' \
        --preferred-backup-window  '10:15-12:15' \
        --backup-retention-period 1 \
        --availability-zone us-east-1c

This should be fairly self explanatory. I’m creating an instance called mydb-instance. The master (basically root) user is called masteruser with password mypass. It also creates an initial database called mydb. You can add more databases and permissions later. This also sets up the maintenance and backup windows which are required, and defined in UTC. The backup retention period is how long it holds on to my backups, which I’ve defined as 1 day. If you set this to 0 it will disable the automated backups entirely which is not advised.

The next thing to do is setup your security groups so that your EC2 (or your hosted servers) have access to your database. There is good documentation on this so I will go over a basic use case.

rds-authorize-db-security-group-ingress \
        default \
        --ec2-security-group-name webnode \
        --ec2-security-group-owner-id XXXXXXXXXXXX

In the case above I’m creating a security group called default, that allows my ec2 security group webnode access. The group-owner-id parameter is your AWS account id.

You can find what your database DNS name is via the rds-describe-db-instances command.

rds-describe-db-instances
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  default.mysql5.1  in-sync

So we can see our hostname is mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com

Now you can login to your instance in the usual way that you access mysql on the command line, setup your users and import your database in the usual way.

mysql -u masteruser -h mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com -pmypass

Using Parameter Groups to View the MySQL Slow Queries Log

As I mentioned earlier you don’t have access to ssh into the instance, so you need to use db parameter groups to tweak your configuration rather than editing the my.cnf file. You can’t see the mysql slow query log on the box but there is still a way to access it and I’ll go over that process.

Amazon won’t let you edit the default group, so the first thing to do is create a parameter group to define your custom parameters.

rds-create-db-parameter-group my-custom --description='My Custom DB Param Group' --engine=MySQL5.1

Then set the parameter to turn the query log on

rds-modify-db-parameter-group my-custom  --parameters="name=slow_query_log, value=ON, method=immediate"

We’re still using the default configuration so you have to tell the instance to use your custom parameter group

rds-modify-db-instance mydb-instance --db-parameter-group-name=my-custom

The first time you apply a new custom group you have to reboot the instance, as pending-reboot here indicates

$ rds-describe-db-instances
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  pending-reboot

So we can reboot it immediately like so

$ rds-reboot-db-instance mydb-instance

When it comes back up it will show that its in-sync

$ rds-describe-db-instances
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  in-sync

We can login to the instance and see that our parameter was set correctly

mysql> show global variables like 'log_slow_queries';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| log_slow_queries | ON    |
+------------------+-------+
1 row in set (0.00 sec)

Since you don’t have access to the filesystem, its logged to a table on the mysql database

mysql> use mysql;

mysql> describe slow_log;
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Field          | Type         | Null | Key | Default           | Extra                       |
+----------------+--------------+------+-----+-------------------+-----------------------------+
| start_time     | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| user_host      | mediumtext   | NO   |     | NULL              |                             |
| query_time     | time         | NO   |     | NULL              |                             |
| lock_time      | time         | NO   |     | NULL              |                             |
| rows_sent      | int(11)      | NO   |     | NULL              |                             |
| rows_examined  | int(11)      | NO   |     | NULL              |                             |
| db             | varchar(512) | NO   |     | NULL              |                             |
| last_insert_id | int(11)      | NO   |     | NULL              |                             |
| insert_id      | int(11)      | NO   |     | NULL              |                             |
| server_id      | int(11)      | NO   |     | NULL              |                             |
| sql_text       | mediumtext   | NO   |     | NULL              |                             |
+----------------+--------------+------+-----+-------------------+-----------------------------+
11 rows in set (0.11 sec)

We may also want to set things like the slow query time, since the default of 10 is pretty high

$ rds-modify-db-parameter-group my-custom  --parameters="name=long_query_time, value=3, method=immediate"

The rds-describe-events command keeps a log of what you’ve been doing

$ rds-describe-events
db-instance         2009-12-12T17:44:19.546Z  mydb-instance  Updated to use a DBParameterGroup my-custom
db-instance         2009-12-12T17:45:51.636Z  mydb-instance  Database instance shutdown
db-instance         2009-12-12T17:46:09.380Z  mydb-instance  Database instance restarted
db-parameter-group  2009-12-12T17:56:02.568Z  my-custom        Updated parameter long_query_time to 3 with apply method immediate

And again you can check mysql that your parameter was edited properly. Note how this time we didn’t have to reboot anything as our parameter group is already active on this instance

mysql> show global variables like 'long_query_time';
+-----------------+----------+
| Variable_name   | Value    |
+-----------------+----------+
| long_query_time | 3.000000 |
+-----------------+----------+
1 row in set (0.00 sec)

Summary

In this article we went over some basics of Amazon RDS and why you may or may not want to use it. If you are just starting out its a really easy way to get a working mysql setup going. However if you are porting from an architecture with multiple slaves or other HA options this may not be for you just yet. We also went over some basic use cases on how to tweak parameters since there is no command line access to the box itself. There is command line access to mysql though, so you can use your favorite tools there.

References

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2935&categoryID=295

Tags: , , , ,


Nov 01 2009

Migrating SVN to Git

Category: Version Controljgoulah @ 5:55 PM

Overview

SVN migrations to Git can vary in complexity, depending on how old the repository is and how many branches were created and merged, as well as if you use regular SVN or a facade such as SVK to do your branching and merging. If you have a fairly new repository and the standard setup of a trunk, branches, and tags directory, your job could be pretty easy. However if you’ve done a ton of branching and merging, or your repository follows a non-standard directory setup, or that setup changed over time, you could have a bit more work on your hands.

SVN to Git Conversion

Building Git

The very first thing to do is make sure you are running the newest git. And for this I recommend from source. So grab it and do the usual compile stuff

$ wget http://kernel.org/pub/software/scm/git/git-1.6.5.2.tar.bz2
$ tar xjf git-1.6.5.2.tar.bz2
$ cd git-1.6.5.2
$ ./configure
$ make

Now I’m not sure if this is because I am always using local::lib but I tend to get this error after a while of building. If you don’t get it you can obviously skip this step

Only one of PREFIX or INSTALL_BASE can be given

This is fixable by going into the git/perl directory and manually building the perl modules

$ cd perl/
$ perl Makefile.PL

Then go back up a directory and finish the build

$ cd ..
$ make
$ sudo make install

Using git-svn

If you have made any merges using SVK then you should grab the git-svn from Sam Vilain’s git branch here. I had to download it because the git clone wasn’t working so grab it like so

$ wget http://download.github.com/samv-git-ccef8e0.tar.gz

Open it up, and copy the git-svn.perl script into your bin directory

$ tar xzvf samv-git-ccef8e0.tar.gz
$ cd samv-git-ccef8e0
$ cp git-svn.perl ~/bin/git-svn

Now you can git svn clone your repository with the git-svn command. The –prefix=svn/ is necessary because otherwise the tools can’t tell apart SVN revisions from imported ones. If you are using the standard trunk, branches, tags layout you’ll just put –stdlayout like we have below. However if you had something different you may have to pass the –trunk, –branches, and –tags in to identify what is what. For example if your repository structure was trunk/companydir and you branched that instead of trunk, you would probably want ‘–trunk=trunk/companydir –branches=branches‘. You can optionally provide an authors file, and the last option to our script is the repository URL.

$ git-svn clone --prefix=svn/ --stdlayout --authors-file=authors.txt http://example.com/svn

This can take a few minutes or as long as a few hours depending how big your repository is. When its done you should end up with a git checkout of your repository. If you look at your remote branches you’ll see that its ported our svn stuff with the svn prefix we used

$ git branch -r
  svn/dbic_versioning
  svn/feed_source
  svn/trunk

Anyway at this point it doesn’t hurt to create a tarball of your project so you can get back to it later if you screw up.

Fixing the Git Repository

There are a few more scripts you need to grab to finish this off. We want to clone the git-svn-abandon repository for these.

$ git clone git://github.com/nothingmuch/git-svn-abandon.git

Now go into that directory and copy the new scripts into your local bin directory, or wherever you feel is good for your purposes

$ cd git-svn-abandon
$ cp git-svn-abandon-* ~/bin/

You can now run git-svn-abandon-fix-refs, which will run through all the imported refs, recreating properly dated annotated tags, and makes branches out of everything else. It’ll also rename trunk to master.

$ git-svn-abandon-fix-refs

We can see that it took the things that our git branch -r command earlier listed and imported them into actual git branches, as well as putting us on the master branch which is analogous to the trunk from svn

$ git branch
  dbic_versioning
  feed_source
* master

Grafting Your Branches

Depending how correct you want your repository, you could skip this step. However it could be a good idea to try to get your merge history correct. This requires a little bit of understanding of the git internals, so at a minimum you should read and understand this introduction. There is also a nice presentation of how git works here that you may consider watching.

So now you understand (you went over those links right?) that the way git figures out merges is such that it basically creates a new node on the graph that points to both the head of master and of the branch. And then it basically takes the file contents from the 3 points, the new merge base node (the ancestor), and each parent, deltafies them and applies the result to the ancestor. Obviously if they don’t apply cleanly a conflict is created and must be resolved. The previous revisions contents is no longer relevant at all, except as a part of history, and our new snapshot of the history is born. So this is basically what we have to fix.

This is a snapshot from gitk of my repository. We can see near the bottom that it branched properly, but even though the repository was merged in svn, it doesn’t appear to be merged in our migration to git.

The way to fix this is to edit the .git/info/grafts file. With my great photoshop skills I’m going to point out what goes in there

We need to take what the first arrow is pointing to, and tell git that its parents are its current parent (the third arrow) as well as the point the branch was merged in (the second arrow). So all we have to do is add the hash from each of those commits, put the top node first, then the second and third with a space in between each to the .git/info/grafts file.

After we add those to the .git/info/grafts file we can just reload gitk or gitx to see those changes that visually show the branch was merged

When you are done you can run git-svn-abandon-cleanup which cleans up SVK style merge commit messages and removes git-svn-id strings. Another important thing that happens is the grafts entries are incorporated into the filtered commits, so the extra merge metadata becomes clonable

$ git svn-abandon-cleanup

Publish Your Repository

If you followed the last article you’ve got a gitosis setup that we can use to store our code centrally. So depending what host this is on you can import it like so

$ git remote add origin git@localhost:repository.git
$ git push --all
$ git push --tags

And we’re done.

Conclusion

We’ve gone over how to migrate an SVN repository to Git and how to deal with some of the complications of how Git interprets our repository based on what was in SVN and how to go about fixing that. Hopefully we also learned a little bit about Git too in the process.

References

Tags: , , ,


Oct 31 2009

Setting up Gitosis

Category: Version Controljgoulah @ 4:33 PM

Overview

This article is part one of a two part series that covers setting up a hosting server using gitosis for your central repository, and in the next article, taking an existing SVN repository and running the appropriate scripts and commands necessary to migrate it into something git can work with.

So this article is how to setup and manage a git repository.There are some great services out there than can do this for you, but why pay money for something you can easily do for free? This article shows how to setup and manage a secure and private git repository that people can use as a central sharing point.

Setting Up Gitosis

Gitosis is a tool for hosting git repositories. Its common usage is for a central repository that other developers can push changes to for sharing.

First clone the gitosis repository and run the basic python install. You just need the python setuptools package

sudo apt-get install python-setuptools

And then you can easily install it

git clone git://eagain.net/gitosis.git
cd gitosis
sudo python setup.py install

Next you need to create a user that will own the repositories you want to manage. You can put its home directory wherever you want, but in this example we’ll put it in the standard /home location.

sudo adduser \
    --system \
    --shell /bin/sh \
    --gecos 'git version control' \
    --group \
    --disabled-password \
    --home /home/git \
    git

Then you must create an ssh public key (or use your existing one) for your first repository user. We’ll use an init command to copy it to server and load it. If you don’t have a public key you can create one with ssh-keygen like so

ssh-keygen -t dsa

Then gitosis-init is for the first time only, loads up your users key, and goes like this

sudo -H -u git gitosis-init < ~/.ssh/id_dsa.pub

Here it doesn’t hurt to make sure your post-update hook has execute permissions.

sudo chmod 755 /home/git/repositories/gitosis-admin.git/hooks/post-update

Now you can clone the gitosis-admin repository, which is used to manage our repository permissions.

git clone git@YOUR_SERVER_HOSTNAME:gitosis-admin.git
cd gitosis-admin

Now you can see you have a gitosis.conf file and a keydir directory

$ ls -l
total 8
-rw-r--r-- 1 jgoulah mygroup   83 2009-10-31 20:44 gitosis.conf
drwxr-xr-x 2 jgoulah mygroup 4096 2009-10-31 20:44 keydir

The gitosis.conf file holds group and permission information for your repositories, and the keydir folder holds your public keys.

If I look in there I see my public key was imported from our earlier gitosis-init command

$ ls -l keydir/
total 4
-rw-r--r-- 1 jgoulah mygroup 603 2009-10-31 20:44 jgoulah.pub

So open up gitosis.conf and you should already see you have an entry for the gitosis-admin repository that we just cloned. The gitosis-init command above setup the access for us. From now on we can just crack open gitosis.conf and edit the permissions, commit and push back to our central repository.

If I wanted to create a new project for a repository called pizza_maker it would look something like this.

[group myteam]
members = jgoulah
writable = pizza_maker

Don’t forget the members section is the name of your public key file without the .pub at the end. If your key was named XYZ.pub then your member line would have XYC here.

git commit -a -m "Create new repo permissions for pizza_maker project"
git push

As a reminder the second part of this series will show an svn to git import. For now lets assume we are starting from scratch. We’d create our project like this

cd && mkdir pizza_maker
cd pizza_maker
git init
git remote add origin git@YOUR_SERVER_HOSTNAME:pizza_maker.git
git add *
git commit -m "some stuff"
git push origin master:refs/heads/master

The only other thing to know is if you want to grant another user access to your repository. All you have to do is add their public key to the keydir folder, and then give the user permissions by modifying gitosis.conf

cd gitosis-admin
cp ~/otherdude.pub keydir/
 [group myteam]
- members = jgoulah
+ members = jgoulah otherdude
  writable = pizza_maker

If you need to, you can also grant public access over the git:// protocol like so

sudo -u git git-daemon --base-path=/home/git/repositories/ --export-all

Then someone can clone like

git clone git://YOUR_SERVER_HOSTNAME/pizza_maker.git

Conclusion

This article showed how to setup gitosis, how to initialize your gitosis-admin repository, which is a unique concept in itself to use a repository to manage repositories, and it works rather well. We also went over how to create our own new git repository, and how to manage the access permissions through gitosis.conf. Part two of this series will explain how to port from your current SVN setup to a Git setup. This article was a prerequisite if you want to host your own private repository when you’re converting from SVN to Git, and thats what we’ll look at next time.

Tags: , , , , , ,


Sep 07 2009

Deploying A Catalyst App on Private Hosting

Category: Deploymentjgoulah @ 1:26 PM

Intro

In the last post I wrote about deploying Catalyst on shared hosting.  While shared hosting may seem attractive pricewise, you’ll quickly grow out of it, and it makes sense to move to hosting with some more control if you are serious about your website.   There are lots of choices when it comes to where you may want to host,  and if you are looking for virtual options, Slicehost, Linode, prgmr, or even Amazon EC2 are great choices.  Picking and setting these up are well beyond the scope of this article,  so I’m assuming you have a server with root privileges ready to go.

Setting Up Your Modules

As I probably sound like a broken record if you’ve read any of my other perl-related entries,  its a good idea to setup your modules using local::lib.  Check out the last article on how to set it up,  or the module docs do a fine job if you follow the bootstrap instructions.  I always create a new user to set this up as,  such as ‘perluser’  or similar,  this way your modules aren’t affected by upgrades as your own user.  We’ll show how to point to these modules shortly.

For now, you’ll want to also make sure  you have FCGI and FCGI::ProcManager installed as that user that you just setup along with the rest of your apps modules.

$ cpan FCGI
$ cpan FCGI::ProcManager

You may want to checkout your app under this user, so that you can just do

$ perl Makefile.PL
$ make installdeps

assuming you’ve kept your Makefile.PL updated, this should be all of the perl modules you need to run your application.

Setting Up Apache

You can setup apache using your package manager of choice.  For ease of this article I’ll assume you’re on Ubuntu, or Debian based system, and you can do something like

$ sudo apt-get install apache2.2-common
$ sudo apt-get install apache2-mpm-worker

And you also need the mod_fastcgi module.  You could download and compile it.  Or you could grab it from apt.   You’ll probably have to add the multiverse repository,  so update your /etc/apt/sources.listso that each line will end with

main restricted universe multiverse

Now you can

$ sudo apt-get update
$ sudo apt-get install libapache2-mod-fastcgi

Setup the VirtualHost and App Directory Structure

You need to put a VirtualHost so that Apache knows how to handle the requests to your domain

FastCgiExternalServer /opt/mysite.com/fake/prod -socket /opt/mysite.com/run/prod.socket

<VirtualHost *:80>
ServerName www.mysite.com

DocumentRoot /opt/mysite.com/app
Alias /static /opt/mysite.com/app/root/static/
Alias / /opt/mysite.com/fake/prod/
</VirtualHost>

This “/” alias ties your document root to the listening socket defined in the FastCgiExternalServer line. The “/static” alias make sure your static files are served by apache, instead of fastcgi.

This config also assumes some directory structure is setup,  which is really entirely up to you.  But here we’ll assume you have a directory located at /opt/mysite.com with a few directories under that called fake, run, and app.

$ sudo mkdir -p /opt/mysite.com/{fake,run,app}

The only directory you have to put anything in is app, which should contain your code.

Note, this is a very simplified layout.  In the real world I’d put the fake, run, and app dirs under a versioned directory,  which my active virtualhost would then point to.  I’ve talked briefly about this kind of deployment technique before at a high level and there is a great writeup here on the technical details on how to use multiple fastcgi servers to host your staging and production apps with seamless deployment.  Part of the beauty of using FastCGI is that you can run two copies of the app against the same socket so its easy to bring up instances pointing to different versions of your code, and deploy with zero downtime.

Launching FastCGI

The last piece of the puzzle is to have a launch script,  which makes sure that your app is listening on the socket.  So to keep it simple you would have a script called launch_mysite.sh that looks like this

#!/bin/sh

export PERL5OPT='-Mlib=/home/perl_module_user/perl5/lib/perl5'

/opt/mysite.com/app/script/mysite_fastcgi.pl \
-l /opt/mysite.com/run/prod.socket -e \
-p  /opt/mysite.com/run/prod.pid -n 15

The first line is telling the script to use the modules from the user we setup the local::lib to hold the modules, so make sure you change this to the correct location.  Then it starts up fastcgi to listen on your socket, and create a process pid,  and to spawn in this case 15 processes to handle requests. Go ahead and hit your domain, and it should show you your website.

Conclusion

We’ve gone over the basics on how to setup FastCGI using FastCgiExternalServer. You now have a lot of flexibility in how many processes are handling requests, the ability to run different copies of your app and flipping the switch, and pointing to which modules are run with your app. There are a lot of improvements you can now make to setup a very sane deployment process so that each version of code deployed can be its own standalone build and ensuring your production app has 100% uptime, but at this point its up to your imagination.

Tags: , , , , ,


Aug 29 2009

Deploying a Catalyst App on Shared Hosting

Category: Deploymentjgoulah @ 1:48 PM

Intro

People have long complained that one of the tricky things about perl is the deployment phase, much because of the intricacies of mod_perl and its unfriendliness towards shared environments. In fact, I would highly recommend FastCGI over mod_perl since it is quite easy to understand and configure. This post is going to focus on smaller shared hosting environments, and how easy it can be to quickly deploy a Catalyst web app.

Getting Started

Assumptions

There are some basic assumptions here. First we need a Linux webserver that has Apache installed and is loading up
fcgid. I believe you can also use the favored mod_fastcgi which I just pointed to above, but I have yet to test this on a shared host. These are binary compatible modules so in theory both work. But again I’ve only used mod_fastcgi for large non-shared hosted deployments. You’ll also need mod_rewrite which is now fairly common.

Installing Your Modules

The best way to install the modules your application depends on is using local::lib. I’ve talked about this before so there isn’t a lot of need to go over the process in detail again, but in a nutshell you can do

$ wget http://search.cpan.org/CPAN/authors/id/A/AP/APEIRON/local-lib-1.004006.tar.gz
$ tar xzvf local-lib-1.004006.tar.gz
$ cd local-lib-1.004006
$ perl Makefile.PL --bootstrap
$ make test && make install
$ echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >>~/.bashrc
$ source ~/.bashrc

Now you have an environment that you can install your modules into. By default this is localized to ~/perl5. The next step is to install your modules that the application requires. It is good practice to put these into your Makefile.PL so that you can easily install them in one shot. A very basic one would follow this template

use inc::Module::Install;

name 'MyApp';
all_from 'lib/MyApp.pm';

requires 'Catalyst::Runtime' => '5.80011';
requires 'Config::General';
# require other modules here

install_script glob('script/*.pl');
auto_install;
WriteAll;

Now its easy to do

$ export PERL_MM_USE_DEFAULT=1
$ perl Makefile.PL
$ make installdeps

The PERL_MM_USE_DEFAULT will configure things such that you don’t have to press enter at every question about a dependency. The make installdeps will install any missing modules, which in this case is going to be everything. You can upgrade the version numbers in the Makefile.PL “requires” lines if you want installdeps to grab the newer distributions as they are released to CPAN.

Configuring Your App

First thing we have to do is a minor edit to our fastcgi script, which is to tell it to use our local::lib. Since its not part of the environment we setup earlier in .bashrc we have to tell the fastcgi perl script where to find things. Below the “use warnings;” line add this

use lib "/home/myuser/perl5/lib/perl5";
use local::lib;

Make sure to change the path to the correct location of your perl5 modules directory.

The last thing is to make sure your app is located in the public directory root for your host. In my case I created a symbolic link from the public_html folder to my app.

$ cd && mv public_html public_html.old  # get rid of current root folder
$ ln -s ~/myapp ~/public_html

Then create a .htaccess file in that folder, which should reside beside all of your code

Options ExecCGI FollowSymLinks
Order allow,deny
Allow from all
AddHandler fcgid-script .pl

RewriteEngine On
RewriteCond %{REQUEST_URI} !^/?script/myapp_fastcgi.pl
RewriteRule ^(.*)$ script/myapp_fastcgi.pl/$1 [PT,L]
RewriteRule ^static/(.*)$ static/$1 [PT,L]

We’re just telling apache to turn CGI on, and make sure to execute our fastcgi perl script. Be sure to change the rewrite lines to point to your script (hint: change myapp to your app name).

Now, you should be able to hit your domain. Simple!

Conclusion

So we’ve seen that deploying perl can actually be fairly easy. There are of course some assumptions here, for example, to get the rewrite rules working you’ll need mod_rewrite.so, but this is fairly standard these days. Now you can deploy a perl app with the same ease as languages such as PHP, where it is pretty much plug and play. This should enable people to more easily compete with all the badly written blog, forum, and other generically useful software in the open source world.

Tags: , , ,


Jun 11 2009

Investigating Data in Memcached

Category: Databases, Systemsjgoulah @ 9:58 PM

Intro

Almost every company I can think of uses Memcached at some layer of their stack, however I haven’t until recently found a great way to take a snapshot of the keys in memory and their associated metadata such as expire time, LRU time, the value size and whether its been expired or flushed. The tool to do this is called peep

Installing Peep

The main thing about installing peep is that you have to compile memcached with debugging symbols. Not really a big deal though. First thing you’ll want to do is install libevent

$ wget http://monkey.org/~provos/libevent-1.4.11-stable.tar.gz
$ tar xzvf libevent-1.4.11-stable.tar.gz
$ cd libevent-1.4.11-stable
$ ./configure
$ make
$ sudo make install

Now grab memcached

$ wget http://memcached.googlecode.com/files/memcached-1.2.8.tar.gz
$ tar xzvf memcached-1.2.8.tar.gz
$ cd memcached-1.2.8
$ CFLAGS='-g' ./configure --enable-threads
$ make
$ sudo make install

Note the configure line on this one sets the -g flag which enables debug symbols. Now move this directory somewhere standard like /usr/local/src because we’ll need to reference it when installing peep

$ sudo mv memcached-1.2.8 /usr/local/src

Ok, now for peep, which is written in ruby. If you don’t have rubygems you need that and the ruby headers. Just use your package management system for this. I happened to be on a CentOS box so I ran

$ sudo yum install rubygems.noarch
$ sudo yum install ruby-devel.i386

Finally you can install peep

$ sudo gem install peep -- --with-memcached-include=/usr/local/bin/memcached-1.2.8

Using Peep

Finally you can actually use this thing, you just give it the running memcached process pid

$ sudo peep --pretty `cat /var/run/memcached/memcached.pid`

Here’s a snippet of what peep can show us in its “pretty” mode

      time |   exptime |  nbytes | nsuffix | it_f | clsid | nkey |                           key | exprd | flushd
       485 |       785 |   17925 |      10 | link |    25 |   64 |  "post.ordered-posts1-03afdb" |  true |  false
       537 |       721 |   16991 |      10 | link |    24 |   63 |  "post.ordered-posts1-03bd6"  |  true |  false
       240 |      3684 |     434 |       8 | link |     9 |   22 |  "channel.channel.105.v1"     | false |  false
       241 |      3687 |   27286 |      10 | link |    27 |   55 |  "post.post_count-fec35129"   | false |  false
       538 |      4022 |    3223 |       9 | link |    17 |   55 |  "post.post_count-2ff57a7"    | false |  false
       538 |      4024 |   17169 |      10 | link |    25 |   55 |  "post.post_count-2ba928998d" | false |  false
       241 |      3686 |   10763 |      10 | link |    22 |   55 |  "post.post_count-3879a24011" | false |  false
        25 |       320 |    8874 |       9 | link |    22 |   22 |  "channel.posterframes.4"     |  true |  false

Putting the Data Into MySQL

The above is not the easiest thing to analyze, especially if you have a lot of data in cache. But we can easily load it in to MySQL so that we can run queries on it.

First create the db and permissions

mysql> create database peep;
mysql> grant all on peep.* to peep@'localhost' identified by 'peep4u';

Then a table to store the output of the peep snapshot

mysql> CREATE TABLE `entries` (
  `lru_time` int(11) DEFAULT NULL,
  `expires_at_time` int(11) DEFAULT NULL,
  `value_size` int(11) DEFAULT NULL,
  `suffix_size` int(11) DEFAULT NULL,
  `it_flag` varchar(255) DEFAULT NULL,
  `class_id` int(11) DEFAULT NULL,
  `key_size` int(11) DEFAULT NULL,
  `key_name` varchar(255) DEFAULT NULL,
  `is_expired` varchar(255) DEFAULT NULL,
  `is_flushed` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now run peep and output the data in “ugly” format into a file

$ sudo peep --ugly `cat /var/run/memcached/memcached.pid` > peep.out

And now you can load it into your entries table

mysql> load data local infile '/tmp/peep.out' into table entries fields terminated by ' | ' lines terminated by '\n';
Query OK, 177 rows affected (0.01 sec)
Records: 177  Deleted: 0  Skipped: 0  Warnings: 0

I’m only loading a small dev install of memcahed but if you are running this on production you’d be importing many more rows. Luckily load data infile is sufficiently optimized to input large datasets. Somewhat unfortunately however, peep makes memcached block while its taking its snapshot, which can take a bit of time in production.

In any case, not that interesting with dev data but you can get lots of interesting numbers out of this. In my case it looks like most of the stuff in my cache is expired already.

mysql> select is_expired, count(*) as num, sum(value_size) as value_size from entries group by is_expired;
+------------+-----+------------+
| is_expired | num | value_size |
+------------+-----+------------+
| false      |   1 |        415 |
| true       | 176 |    2392792 |
+------------+-----+------------+
2 rows in set (0.01 sec)

Or selecting by grouping slabs and displaying their sizes

mysql> select class_id as slab_class, max(value_size) as slab_size from entries group by slab_class;
+------------+-----------+
| slab_class | slab_size |
+------------+-----------+
|          1 |         8 |
|          2 |        15 |
|          9 |       445 |
|         12 |      1100 |
|         13 |      1402 |
|         14 |      1710 |
|         15 |      2174 |
|         16 |      2409 |
|         17 |      3223 |
|         20 |      6154 |
|         21 |      8395 |
|         22 |     10763 |
|         23 |     13395 |
|         24 |     16991 |
|         25 |     17925 |
|         26 |     23320 |
|         27 |     33361 |
|         28 |     38824 |
|         29 |     44904 |
+------------+-----------+
19 rows in set (0.00 sec)

Conclusion

Although there are a lot of tools such as Cacti that will display graphs of your Memcached usage, there are not many tools that will actually show you specifics of whats in memory. This tool can be used for a variety of reasons, but its especially useful to example the keys that you have in memory and the size of their values, as well as if they’re expired and what the individuals slabs are holding.

Tags: , , , ,


May 17 2009

Using DBIx::Class to Version Your Schema

Category: Databases, Deployment, Schema Versioningjgoulah @ 5:53 PM

Intro to DBIx::Class

In my opinion DBIx::Class is one of the best ORM solutions out there. Not only can it model your database, including mapping out any foreign key relationships, it can also be used as a canonical point of reference for your schema.   This means we can use it not only as an application layer interface to the database, but can also define a versioned database structure using the code,  where if you add a field into your result class,  you can generate a version of alter statements and a DDL just by running a simple script.   Of course this set of alterations can later be applied to your stage and production environments.

It’s possible to produce your schema classes with DBIx::Class::Schema::Loader if you have a pre-existing database and are initially generating your DBIC classes, but this article is going to show going from the ground up how to build your database from the code defining the fields and tables as well as the relations, which match to foreign key constraints when the database definition files are generated.

Creating your Schema

This example will be a pretty simple schema but enough to demonstrate generating a couple of tables with a foreign key constraint. Since a lot of people are doing social networking stuff these days we can assume a common use case would be a user who has a bunch of photos. So we’ll need to model a user table and a photo table, but first things first, lets create a few basic things, assuming we’ll call our application MySocialApp

mkdir -p MySocialApp/lib/MySocialApp/Schema/Result
cd MySocialApp

We need to create a schema file at lib/MySocialApp/Schema.pm that inherits from the base DBIx::Class::Schema

package MySocialApp::Schema;
use base qw/DBIx::Class::Schema/;

use strict;
use warnings;
our $VERSION = '0.00001';

__PACKAGE__->load_namespaces();

__PACKAGE__->load_components(qw/Schema::Versioned/);

__PACKAGE__->upgrade_directory('sql/');

1;

So here all we are doing is extending the base Schema and loading the Versioned component. We’re setting the directory where DDL and Diff files will be generated into the sql directory. We’re also invoking load_namespaces which tells it to look at the MySocialApp::Schema::Result namespace for our result classes by default.

Creating the Result Classes

Here I will define what the database is going to look like

package MySocialApp::Schema::Result::User;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('user');

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

__PACKAGE__->set_primary_key('user_id');

__PACKAGE__->has_many(
    'photos',
    'MySocialApp::Schema::Result::Photo',
    { "foreign.fk_user_id" => "self.user_id" },
);

1;

This is relatively straightforward. I’m creating a user table that has an auto incrementing primary key and a username field. I’m also defining a relationship that says a user can have many photos. Lets create the photo class.

package MySocialApp::Schema::Result::Photo;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';

 __PACKAGE__->load_components(qw/Core/);
 __PACKAGE__->table('photo');

 __PACKAGE__->add_columns(

    photo_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    url => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    fk_user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        extra => { unsigned => 1 },
    },
);

__PACKAGE__->set_primary_key('photo_id');

 __PACKAGE__->belongs_to(
    'user',
    'MySocialApp::Schema::Result::User',
    { 'foreign.user_id' => 'self.fk_user_id' },
);

1;

Same basic thing here, a photo class with a photo_id primary key, a url field, and an fk_user_id field that keys into the user table. Each photo belongs to a user, and this relationship will define our foreign key constraint when the schema is generated.

Versioning the Database

Create the Versioning Scripts

We have the main DBIx::Class pieces in place to generate the database, but we’ll need a couple of scripts to support our versioned database. One script will generate the schema based on the version before it, introspecting which alterations have been made and producing a SQL diff file to alter the database. The other script will look at the database to see if it needs upgrading, and run the appropriate diff files to bring it up to the current version.

First the schema and diff generation script which we’ll call script/gen_schema.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use Pod::Usage;
use Getopt::Long;
use MySocialApp::Schema;

my ( $preversion, $help );
GetOptions(
        'p|preversion:s'  => \$preversion,
        ) or die pod2usage;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );
my $version = $schema->schema_version();

if ($version && $preversion) {
    print "creating diff between version $version and $preversion\n";
} elsif ($version && !$preversion) {
    print "creating full dump for version $version\n";
} elsif (!$version) {
    print "creating unversioned full dump\n";
}

my $sql_dir = './sql';
$schema->create_ddl_dir( 'MySQL', $version, $sql_dir, $preversion );

This script will be run anytime we change something in the Result files. You give it the previous schema version, and it will create a diff between that and the new version. Before running this you’ll update the $VERSION variable in lib/MySocialApp/Schema.pm so that it knows a change has been made.

The next script is the upgrade script, we’ll call it script/upgrade_db.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use MySocialApp::Schema;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
        );

if (!$schema->get_db_version()) {
    # schema is unversioned
    $schema->deploy();
} else {
    $schema->upgrade();
}

This script checks to see if any diffs need to be applied, and applies them if the version held by the database and the version in your Schema.pm file differ, bringing the database up to the correct schema version.

Note, in these scripts I’ve hardcoded the DB info which really should go into a configuration file.

Create a Database to Deploy Into

We need to create the database that our tables will be created in. In the connect calls above we’re using this user and password to connect to our database. I’m using MySQL for the example so this would be done on the MySQL command prompt

mysql> create database mysocialapp;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on mysocialapp.* to mysocialapp@'localhost' identified by 'mysocialapp4u';
Query OK, 0 rows affected (0.01 sec)

Deploy the Initial Schema

Now its time to deploy our initial schema into MySQL. But for the first go round we also have to create the initial DDL file, this way when we make changes in the future it can be compared against the Schema result classes to see what changes have been made. We can do this by supplying a nonexistent previous version to our gen_schema.pl script

$ perl script/gen_schema.pl -p 0.00000
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.
creating diff between version 0.00001 and 0.00000
No previous schema file found (sql/MySocialApp-Schema-0.00000-MySQL.sql) at /home/jgoulah/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1685.

And we can see the DDL file now exists

$ ls sql/
MySocialApp-Schema-0.00001-MySQL.sql

Then we need to deploy to MySQL for the first time so we run the upgrade script

$ perl script/upgrade_db.pl
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.

Now check out what its created in MySQL

mysql> use mysocialapp;
Database changed
mysql> show tables;
+----------------------------+
| Tables_in_mysocialapp      |
+----------------------------+
| dbix_class_schema_versions |
| photo                      |
| user                       |
+----------------------------+
3 rows in set (0.00 sec)

There is our photo table, our user table, and also a dbix_class_schema_versions table. This last table just keeps track of what version the database is. You can see we are in sync with the Schema class by selecting from that table and also when this version was installed.

mysql> select * from dbix_class_schema_versions;
+---------+---------------------+
| version | installed           |
+---------+---------------------+
| 0.00001 | 2009-05-17 21:59:57 |
+---------+---------------------+
1 row in set (0.00 sec)

The really great thing is that we have created the tables -and- constraints based on our schema result classes. Check out our photo table

mysql> show create table photo\G
*************************** 1. row ***************************
       Table: photo
Create Table: CREATE TABLE `photo` (
  `photo_id` int(11) unsigned NOT NULL auto_increment,
  `url` varchar(255) NOT NULL,
  `fk_user_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`photo_id`),
  KEY `photo_idx_fk_user_id` (`fk_user_id`),
  CONSTRAINT `photo_fk_fk_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Making Database Changes

Lets say we want to add a password field to our user table. I’d open up the lib/MySocialApp/Schema/Result/User.pm file and add a section for my password field to the add_columns definition so now it looks like:

 __PACKAGE__->add_columns(

    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
    password => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    },
);

Then we update the lib/MySocialApp/Schema.pm file and update to the next version so it looks like

our $VERSION = '0.00002';

To create the DDL and Diff for this version we run the gen_schema script with the previous version as the argument

$ perl script/gen_schema.pl -p 0.00001
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.
creating diff between version 0.00002 and 0.00001

If you look in the sql directory there are two new files. The DDL is named MySocialApp-Schema-0.00002-MySQL.sql and the diff is called MySocialApp-Schema-0.00001-0.00002-MySQL.sql and has the alter statement

$ cat sql/MySocialApp-Schema-0.00001-0.00002-MySQL.sql
-- Convert schema 'sql/MySocialApp-Schema-0.00001-MySQL.sql' to 'MySocialApp::Schema v0.00002':;

BEGIN;

ALTER TABLE user ADD COLUMN password VARCHAR(255) NOT NULL;

COMMIT;

Now we can apply that with our upgrade script

$ perl script/upgrade_db.pl
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.

DB version (0.00001) is lower than the schema version (0.00002). Attempting upgrade.

and see that the upgrade has been made in MySQL

mysql> describe user;
+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| user_id  | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(255)     | NO   |     |         |                |
| password | varchar(255)     | NO   |     |         |                |
+----------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Conclusion

So now we’ve seen a really easy way that we can maintain our database schema from our ORM code. We have a versioned database that is under revision control, and can keep our stage and production environments in sync with our development code. With this type of setup its also easy to maintain branches with different database changes and merge them into the mainline, and it also ensures that the database you are developing on is always in sync with the code.

Tags: , , ,


Apr 19 2009

Make Browsing Code Easier with Ack and Ctags

Category: Code Organizationjgoulah @ 8:44 PM

Intro

When working with open source software, its essential to know how to navigate large code bases, perhaps unfamiliar and quite large. There are a few tools I use to do this that should be part of any developers arsenal, and they are: ack and ctags.

Ack can be thought of as a faster and more powerful grep. It searches recursively by default, ignores binary files and most version control files (think .svn), lets you specify file types on search, use perl regular expressions, and has easier to read output than grep.

Ctags is a tool that many are familiar with, and there are tons of articles about it already.  But I bring it up so I can show some quick starter scripts that you’d use to generate the tags for PHP or Perl scripts.  I’ll show a quick C++ and Java example too, since I use those from time to time.

Ctags

Installation

There’s really not much to installing ctags.  You could download and compile the source, but just get it from your package management system.

If you’re using Debian or Ubuntu you can do:

sudo apt-get isntall exuberant-ctags

Similarly in CentOS and Redhat based distros:

sudo yum install ctags

Usage

Ctags basically indexes your code and creates a tag file that can then be used in your editor to literally jump around your code.  If you see a method call as you’re browsing code, you can jump to the definition of that method with one keystroke, and back to where you were.  Same thing for variables.  In a keystroke you can jump to see where its defined.   As you jump through code a stack is created,  and as you jump back you are just popping off that stack, also known as LIFO, by the way.

Generating the Tag Files

I have a few scripts to generate the ctags files depending on different codebases. I tend to use VI so I’m going to cover how to do it with that editor, but you can also use emacs.

For these examples I’m going to send the output to a file in my ~/.vim/tags/ directory, which I’ll later add to .vimrc. You could expand these scripts to fit your needs. For these examples they are pretty basic and hardcoded.

PHP

$ cat bin/ctags_php
#!/bin/bash
cd ~/mysandbox/myphpproject
ctags -f ~/.vim/tags/myphpproject \
--langmap="php:+.inc" -h ".php.inc" -R \
--exclude='*.js' \
--exclude='*.sql' \
--totals=yes \
--tag-relative=yes \
--PHP-kinds=+cf-v \
--regex-PHP='/abstract\s+class\s+([^ ]+)/\1/c/' \
--regex-PHP='/interface\s+([^ ]+)/\1/c/' \
--regex-PHP='/(public\s+|static\s+|abstract\s+|protected\s+|private\s+)function\s+\&?\s*([^ (]+)/\2/f/'

and when you run it, you’ll see something like:

$ ctags_php
498 files, 66678 lines (2624 kB) scanned in 0.6 seconds (4604 kB/s)
1643 tags added to tag file
1643 tags sorted in 0.00 seconds

Perl

In Perl you can do things a bit smarter since you should have a Makefile.PL script to keep track of your dependencies. If so you can add:

postamble(
q!
tags:
ctags -f ~/.vim/tags/myperlcode --recurse --totals \
--exclude=blib \
--exclude='*~' \
--languages=Perl --langmap=Perl:+.t \
!);

Then you should do:

perl Makefile.PL
make tags

C++

You can do a very similar thing for C++ code

$ cd /path/to/code
$ ctags -f ~/.vim/tags/myc++code --tag-relative=yes --recurse --language-force=c++ *

Java

Or say we want to tag the entire java library itself

$ ctags -f ~/.vim/tags/java -R --language-force=java /opt/java/src

Letting VI know about your files

After creating one or more tagfiles you should edit your ~/.vimrc file and add the location to your tag files and separate the entries by commas or spaces

set tags=~/.vim/tags/myphpproject,~/.vim/tags/myperlcode

Navigating Around the Code

In VI there are two easy commands to jump around.

To move to the definition of a method/variable, place the cursor over it and press

Ctrl + ]

And to jump back

Ctrl + t

If you try to jump to something and it isn’t found, its probably something in a library you’re using, so you’ll have to grab the source and tag those too.

Ack

Installation

There are a few ways to install ack listed on the ack homepage.   If you are familiar with CPAN you can install App::Ack, or if you want to use package management you can grab ack-grep on Ubuntu or ack on Redhat based distros.

Usage

The best thing I can really tell you is to read the ack help

$ ack --help

Ack takes a regular expression as the first argument and a directory to search as the second. Typically you want to search all (-a) files or in a case insensitive fashion (-i)

$ ack -ai 'searchstring' .

Or you can search specific file types

$ ack --perl  searchterm

And one really cool thing is though ack gives nice colorized output in a structured fashion, if you pipe it to another process it outputs like grep by default so that you can continue to pipe it

For example lets say I want to find the modules MooseX::Types is using

ack -a '^use.*;' ~/perl5/lib/perl5/MooseX/Types/

It gives something looking like (output truncated)

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Util.pm
9:use warnings;
10:use strict;
12:use base 'Exporter';

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Moose.pm
9:use warnings;
10:use strict;
12:use MooseX::Types;
13:use Moose::Util::TypeConstraints ();
15:use namespace::clean -except => [qw( meta )];

If you pipe the output it looks more like grep output

ack -a '^use.*;' ~/perl5/lib/perl5/MooseX/Types/ | cat

Again the output is truncated but looks like

/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:11:use MooseX::Types::Util             qw( filter_tags );
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:12:use Sub::Exporter                   qw( build_exporter );
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:13:use Moose::Util::TypeConstraints;
/home/jgoulah/perl5/lib/perl5/MooseX/Types/Base.pm:15:use namespace::clean -except => [qw( meta )];

Lets say for example reasons I wanted to find all the modules used in the code and make sure they are installed (using the ‘make install’ command in the module would be the more correct and easier way), you could do

$ ack -ah '^use\s[^\d].*;' ~/perl5/lib/perl5/MooseX/Types/ | \
  ack -v 'warnings|strict|base' | \
  perl -ne "m|use ((\w+:?:?)+)(.*)(;)|; print qq{\$1\n};" | \
  sort -u | xargs cpan

which cleans the output and gives the module list to cpan for installation and it lets me know I have everything installed and up to date

Carp::Clan is up to date (6.00).
Class::MOP is up to date (0.81).
Devel::PartialDump is up to date (0.07).
Moose is up to date (0.74).
Moose::Meta::TypeConstraint::Union is up to date (0.74).
Moose::Util::TypeConstraints is up to date (0.74).
MooseX::Meta::TypeConstraint::Structured is up to date (undef).
MooseX::Types is up to date (0.10).
MooseX::Types::Util is up to date (undef).
namespace::clean is up to date (0.11).
Scalar::Util is up to date (1.19).
Sub::Exporter is up to date (0.982).

Conclusion

These are pretty commonplace but great tools to know if you don’t already. Try to integrate them into your work flow and I think you’ll notice that it will speed you up quite a bit, especially when you are browsing through unfamiliar territory.

Tags: , , , , ,


Next Page »