Jul 09 2016

Approximate Times to Update an RDS Instance

Category: Cloud Computing,Databases,Systemsjgoulah @ 12:59 PM

Here’s a quick overview from an upgrade a couple months ago of an RDS instance type db.t2.medium to type db.r3.large. In addition to changing the instance type, we upgraded the disk from 64GB to 100GB, and applied a new parameter group. The disk increase by far took the longest amount of time. We clocked in at ever so slightly over an hour for the disk increase, while the instance upgrade only took a bit over 16 minutes. As you can see, AWS also does a failover during the upgrade, but we decided to stop writes to this database instead of risking any data corruption.

production01 6:26:46 AM Applying modification to database instance class
production01 6:33:02 AM Multi-AZ instance failover started
production01 6:33:07 AM DB instance restarted
replica01 6:33:20 AM Streaming replication has stopped.
production01 6:33:35 AM Multi-AZ instance failover completed
replica01 6:34:51 AM DB instance shutdown
replica01 6:35:08 AM DB instance restarted
replica01 6:38:50 AM Replication for the Read Replica resumed
production01 6:42:59 AM Finished applying modification to DB instance class
production01 6:43:02 AM Applying modification to allocated storage
production01 7:39:05 AM Finished applying modification to allocated storage

One thing Amazon could really work on here is the feedback loop. Getting no updates whatsoever other than hoping that the word “modifying” eventually goes away and says “available” is really poor in terms of an experience and keeping confidence the operation is going to be successful after an extremely long time. This is an exercise in patience because your only option is to continue to hit the reload button (nope, it doesn’t reload automatically) and hope that it has moved from yellow and is instead showing green and not red.

In any case, I’m mostly publishing this post because I found these type of run time estimates or actual numbers very hard to find. And they are certainly variable depending on a variety of factors, so this can really be used as just one data point. That being said, I’m hoping it can help someone else that is looking for estimated times when planning their RDS upgrade.

Tags: , , , , , , , ,

Mar 30 2016

Why You Should Point Staging to Your Production Database

Category: Databases,Deployment,Systemsjgoulah @ 9:56 AM

I have been thinking about this topic more recently as I’ve just started working with a new infrastructure, and one of the things that I noticed is the staging database is a separate copy of the production database. It also happens to be a very old copy that has drifted, certainly in data and possibly in schema. It was likely put in place to avoid the perceived problem of new code possibly affecting production data in some critical way, and having a safe and solid environment to examine changes before they do hit production. But what are we accomplishing if the environment we test those changes in doesn’t really provide a fully identical setup to production? When you are dealing with DDL and the data underneath that may not actually mimic production, you’re only gaining a sense of false confidence.

This still sounds crazy, you say. “Of course we shouldn’t point staging to production! What if there is some bug that deletes some important rows from our database? We’d want to catch that before it hits in production!” And to this I wonder why your development environment isn’t an adequate testbed for diagnosing this sort of behavior well before its gone to staging in the first place?

As I thought more about this, it started to sound like the concept of staging itself originated more from an issue of difficult to reproduce environments. If your development environment does not exactly mirror production in terms of its configuration and setup, then it makes sense that you would need some intermediary place that is “closer to production” to test those changes. Hence staging is a place to almost test your changes and almost get that confidence you need, but without the same data underneath. In the end, this gives us no more confidence than until it is run in production itself.

In many ways, the staging environment originates from the waterfall model, and the days of manual QA testing. And this is not to say QA testing has no place, just that staging is not the right environment to perform those types of meticulous and slow, possibly at times manual work within the given time constraints. Unless your development to production cycle is hours or days, then the functionality you’re verifying on staging is only the small incremental changes that you are pushing at that time. You’re more likely and hopefully doing end-to-end testing across the codebase, and this is giving the confidence to move code through the pipeline quickly (in addition to metrics at the other end).

What we really want to test in staging is that our changes show up, the site builds and renders properly, and that we can perform the actions that we expect against the data we expect. If we are worried about rogue deleting data because of bad code, there are additional safeguards that can be taken aside from separating the environment entirely. The first thing is simply to not delete data. This may be a non-trivial change for a large legacy codebase, but it is worth considering. Instead of using the DELETE statement, consider flagging that data as removed and respecting the flag. Additionally, rolling code in percentages, to the people at your company and then small portions of traffic is a great way to gain confidence without sacrificing the integrity of the real purpose of staging.

This is not “testing in production”. This is ensuring your development environment is similar enough to production that staging technically only exists as an intermediary place to test your build. This means staging is the next thing in line for whats running on production and we have enough confidence to test our changes against production data. It makes a lot of sense for it to be allowed to interact on that data rather than a copy that may have a drastically different representation than whats on production. Therefore, it is worth considering pointing your staging database to production.

Tags: , , ,

Apr 23 2013

Crossing the Production Barrier: Development at Scale

Category: Conferences,Databasesjgoulah @ 5:59 PM

Slides from my talk at Percona 2013


Apr 15 2012

Percona Live 2012 – The Etsy Shard Architecture

Category: Conferences,Databases,Systemsjgoulah @ 2:58 PM

I attended Percona Conf in Santa Clara last week. It was a great 3 days of lots of people dropping expert MySQL knowledge. I learned a lot and met some great people.

I gave a talk about the Etsy shard architecture, and had a lot of good feedback and questions. A lot of people do active/passive, but we run everything in active/active master-master. This helps us keep a warm buffer pool on both sides, and if one side goes out we only lose about half of the queries until its pulled and they start hashing to the other side. Some details on how this works in the slides below.

Tags: , , , ,

Jan 09 2012

Distributed MySQL Sleuthing on the Wire

Category: Databases,Real-time Web,SSH,Systemsjgoulah @ 8:52 AM


Oftentimes you need to know what MySQL is doing right now and furthermore if you are handling heavy traffic you probably have multiple instances of it running across many nodes. I’m going to start by showing how to take a tcpdump capture on one node, a few ways to analyze that, and then go into how to take a distributed capture across many nodes for aggregate analysis.

Taking the Capture

The first thing you need to do is to take a capture of the interesting packets. You can either do this on the MySQL server or on the hosts talking to it. According to this percona post this command is the best way to capture mysql traffic on the eth0 interface and write it into mycapture.cap for later analysis:

% tcpdump -i eth0 -w mycapture.cap -s 0 "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
47542 packets captured
47703 packets received by filter
60 packets dropped by kernel

Analyzing the Capture

The next step is to take a look at your captured data. One way to do this is with tshark, which is the command line part of wireshark. You can do yum install wireshark or similar to install it. Usually you want to do this on a different host than the one taking traffic since it can be memory and CPU intensive.

You can then use it to reconstruct the mysql packets like so:

% tshark -d tcp.port==3306,mysql -T fields -R mysql.query -e frame.time -e ip.src -e ip.dst -e mysql.query -r mycapture.cap

This will give you the time, source IP, destination IP, and query but this is still really raw output. Its a nice start but we can do better. Percona has released the Percona Toolkit which includes some really nice command line tools (including what used to be in Maatkit).

The one we’re interested in here is pt-query-digest

It has tons of options and you should read the documentation, but here’s a few I’ve used recently.

Lets say you want to get the top tables queried from your tcpdump

% tcpdump -r mycapture.cap -n -x -q -tttt | pt-query-digest --type tcpdump --group-by tables --order-by Query_time:cnt \
 --report-format profile --limit 5
reading from file mycapture.cap, link-type EN10MB (Ethernet)

# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M   Item
# ==== ======== ============= ===== ====== ==== ===== ====================
#    1 0x        0.3140  6.1%   674 0.0005 1.00  0.00 shard.images
#    2 0x        0.8840 17.1%   499 0.0018 1.00  0.03 shard.activity
#    3 0x        0.1575  3.1%   266 0.0006 1.00  0.00 shard.listing_images
#    4 0x        0.1680  3.3%   265 0.0006 1.00  0.00 shard.connection_edges_reverse
#    5 0x        0.0598  1.2%   254 0.0002 1.00  0.00 shard.listing_translations
# MISC 0xMISC    3.5771 69.3%  3534 0.0010   NS   0.0 <86 ITEMS>

Note the tcpdump options I used this time, which the tool requires to work properly when passing –type tcpdump. I also grouped by tables (as opposed to full queries) and ordered by the count (the Calls column). It will stop at your –limit and group the rest into MISC so be aware of that.

You can remove the –order-by to sort by response time, which is the default sort order, or provide other attributes to sort on. We can also change the –report-format, for example to header:

% tcpdump -r mycapture.cap -n -x -q -tttt | pt-query-digest --type tcpdump --group-by tables --report-format header 
reading from file mycapture.cap, link-type EN10MB (Ethernet)

# Overall: 5.49k total, 91 unique, 321.13 QPS, 0.30x concurrency _________
# Time range: 2012-01-08 15:52:05.814608 to 15:52:22.916873
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             5s     3us   114ms   939us     2ms     3ms   348us
# Rows affecte         316       0      13    0.06    0.99    0.29       0
# Query size         3.64M      18   5.65k  694.98   1.09k  386.68  592.07
# Warning coun           0       0       0       0       0       0       0
# Boolean:
# No index use   0% yes,  99% no

If you set the –report-format to query_report you will get gobs of verbose information that you can dive into and you can use the –filter option to do things like getting slow queries:

% tcpdump -r mycapture.cap -n -x -q -tttt | \
  pt-query-digest --type tcpdump --filter '($event->{No_index_used} eq "Yes" || $event->{No_good_index_used} eq "Yes")'

Distributed Capture

Now that we’ve taken a look at capturing and analyzing packets from one host, its time to dive into looking at our results across the cluster. The main trick is that tcpdump provides no option to stop capturing – you have to explicitly kill it. Otherwise we’ll just use dsh to send our commands out. We’ll assume you have a user that can hop around in a password-less fashion using ssh keys – setting that up is well outside the scope of this article but there’s plenty of info out there on how to do that.

There’s a few ways you can let a process run on a “timeout” but I’m assuming we don’t have any script written or tools like bash timeout or the one distributed in coreutils available.

So we’re going off the premise that you will background the process and kill it after a sleep by grabbing its pid:

( /path/to/command with options ) & sleep 5 ; kill $!

Simple enough, except we’ll want to capture the output on each host, so we need to ssh the output back over to the target using a pipe to grab the stdout. This means that $! will return the pid of our ssh command instead of our tcpdump command. We end up having to do a little trick to kill the right process, since the capture won’t be readable if we kill ssh command that is writing the output. We’ll need to kill tcpdump and to do that we can look at the parent pid of the ssh process, ask pkill (similar to pgrep) for all of the processes that have this parent, and finally kill the oldest one, which ends up being our tcpdump process.

Then end result looks like this if I were to run it across two machines:

% dsh -c -m web1000,web1001 \
   'sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | \
   ssh dshhost "cat - > ~/captures/$(hostname -a).cap" & sleep 10 ; \
   sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")'

So this issues a dsh to two of our hosts (you can make a dsh group with 100 or 1000 hosts though) and runs the command concurrently on each (-c). We issue our tcpdump on each target machine and send the output to stdout for ssh to then cat back to a directory on the source machine that issued the dsh. This way we have all of our captures in one directory with each file named with the target name of each host the tcpdump was run. The sleep is how long the dump is going to run for before we then kill off the tcpdump.

The last piece of the puzzle is to get these all into one file and we can use the mergecap tool for this, which is also part of wireshark:

% /usr/sbin/mergecap -F libpcap -w output.cap *.cap

And then we can analyze it like we did above.

Further Reading





Breaking the distributed command down further

Just to clarify this command a bit more, particularly how the kill part works since that was the trickiest part for me to figure out.

When we run this

$ dsh -c -m web1000,web1001 \
   'sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | \
   ssh dshhost "cat - > ~/captures/$(hostname -a).cap" & sleep 10 ; \
   sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")'

on the server the process list looks something like

user     12505 12504  0 03:12 ?        00:00:00 bash -c sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt "port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2" | ssh myhost.myserver.com "cat - > /home/etsy/captures/$(hostname -a).cap" & sleep 5 ; sudo pkill -o -P $(ps -ef | awk "\$2 ~ /\<$!\>/ { print \$3; }")
pcap     12506 12505  1 03:12 ?        00:00:00 /usr/sbin/tcpdump -i eth0 -w - -s 0 -x -n -q -tttt port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2
user     12507 12505  0 03:12 ?        00:00:00 ssh myhost.myserver.com cat - > ~/captures/web1001.cap

So $! is going to return the pid of the ssh process, 12507. We use awk to find the process matching that, and then print the parent pid out, which is then passed to the -P arg of pkill. If you use pgrep to look at this without the -o you’d get a list of the children of 12505, which are 12506 and 12507. The oldest child is the tcpdump command and so adding -o kills that guy off.

So if we were only running the command on one host we could use something much simpler

ssh dbhost01 '(sudo /usr/sbin/tcpdump -i eth0 -w - -s 0 port 3306) & sleep 10; sudo kill $!' | cat - > output.cap

Tags: , , , , ,

Jan 18 2010

Using Mongo and Map Reduce on Apache Access Logs

Category: Databases,Systemsjgoulah @ 9:32 PM


With more and more traffic pouring into websites, it has become necessary to come up with creative ways to parse and analyze large data sets. One of the popular ways to do that lately is using MapReduce which is a framework used across distributed systems to help make sense of large data sets. There are lots of implementations of the map/reduce framework but an easy way to get started is by using MongoDB. MongoDB is a scalable and high performant document oriented database. It has replication, sharding, and mapreduce all built in which makes it easy to scale horizontally.

For this article we’ll look a common use case of map/reduce which is to help analyze your apache logs. Since there is no set schema in a document oriented database, its a good fit for log files since its fairly easy to import arbitrary data. We’ll look at getting the data into a format that mongo can import, and writing a map/reduce algorithm to make sense of some of the data.

Getting Setup

Installing Mongo

Mongo is easy to install with detailed documentation here. In a nutshell you can do

$ mkdir -p /data/db
$ curl -O http://downloads.mongodb.org/osx/mongodb-osx-i386-latest.tgz
$ tar xzf mongodb-osx-i386-latest.tgz

At this point its not a bad idea to put this directory somewhere like /opt and adding its bin directory to your path. That way instead of

 ./mongodb-xxxxxxx/bin/mongod &

You can just do

mongodb &

In any case, start up the daemon one of those two ways depending how you set it up.

Importing the Log Files

Apache access files can vary in the information reported. The log format is easy to change with the LogFormat directive which is documented here. In any case these logs that I’m working with are not out of the box apache format. They look something like this

Jan 18 17:20:26 web4 logger: networks-www-v2 [18/Jan/2010:17:20:26 -0500] "GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1" 200 33445 "http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4" "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv: Gecko/20091221 Firefox/3.5.7" www.channelfrederator.com 35119

We want to take this raw log and convert it to a JSON structure for importing into mongo, which I wrote a simple perl script to iterate through the log and parse it into sensible fields using a regular expression


use strict;
use warnings;

my $logfile = "/logs/httpd/remote_www_access_log";
open(LOGFH, "$logfile");
foreach my $logentry (<LOGFH>) {
    chomp($logentry);   # remove the newline
    $logentry =~ m/(\w+\s\w+\s\w+:\w+:\w+)\s #date
                   (\w+)\slogger:\s # server host
                   ([^\s]+)\s # vhost logger
                   (?:unknown,\s)?(-|(?:\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},?\s?)*)\s #ip
                   \[(.*)\]\s # date again
                   \"(.*?)\"\s # request
                   (\d+)\s #status
                   ([\d-]+)\s # bytes sent
                   \"(.*?)\"\s # referrer 
                   \"(.*?)\"\s # user agent 
                   ([\w.-]+)\s? # domain name
                   (\d+)? # time to server (ms) 

    print <<JSON;
     {"date": "$5", "webserver": "$2", "logger": "$3", "ip": "$4", "request": "$6", "status": "$7", "bytes_sent": "$8", "referrer": "$9", "user_agent": "$10", "domain_name": "$11", "time_to_serve": "$12"} 


Again my regular expression probably won’t quite work on your logs, though you may be able to take bits and pieces of what I’ve documented above to use for your script. On my logs that script outputs a bunch of lines that look like this

{"date": "18/Jan/2010:17:20:26 -0500", "webserver": "web4", "logger": "networks-www-v2", "ip": "", "request": "GET /javascript/2010-01-07-15-46-41/97fec578b695157cbccf12bfd647dcfa.js HTTP/1.1", "status": "200", "bytes_sent": "33445", "referrer": "http://www.channelfrederator.com/hangover/episode/HNG_20100101/cuddlesticks-cartoon-hangover-4", "user_agent": "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv: Gecko/20091221 Firefox/3.5.7", "domain_name": "www.channelfrederator.com", "time_to_serve": "35119"}

And we can then import that directly to MongoDB. The creates a collection called weblogs in the logs database, from the file that has the output of the above JSON generator script

$ mongoimport --type json -d logs -c weblogs --file weblogs.json

We can also take a look at them and verify they loaded by running the find command, which dumps out 10 rows by default

$ mongo
> use logs;
switched to db logs
> db.weblogs.find()

Setting Up the Map and Reduce Functions

So for this example what I am looking for is how many hits are going to each domain. My servers handle a bunch of different domain names and that is one thing I’m outputting into my logs that is easy to examine

The basic command line interface to MondoDB is a kind of javascript interpreter, and the mongo backend takes javascript implementations of the map and reduce functions. We can type these directly into the mongo console. The map function must emit a key/value pair and in this example it will output a ’1′ each time a domain is found

> map = "function() { emit(this.domain_name, {count: 1}); }"

And so basically what comes out of this is a key for each domain with a set of counts, something like

 {"www.something.com", [{count: 1}, {count: 1}, {count: 1}, {count: 1}]}

This is send to the reduce function, which sums up all those counts for each domain

> reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; };"

Now that you’ve setup map and reduce functions, you can call mapreduce on the collection

> results = db.weblogs.mapReduce(map, reduce)
        "result" : "tmp.mr.mapreduce_1263861252_3",
        "timeMillis" : 9034,
        "counts" : {
                "input" : 1159355,
                "emit" : 1159355,
                "output" : 92
        "ok" : 1,

This gives us a bit of information about the map/reduce operation itself. We see that we inputted a set of data and emmitted once per item in that set. And we reduced down to 92 domain with a count for each. A result collection is given, and we can print it out

> db.tmp.mr.mapreduce_1263861252_3.find()
> it

You can type the ‘it’ operator to page through multiple pages of data. Or you can print it all at once like so

> db.tmp.mr.mapreduce_1263861252_3.find().forEach( function(x) { print(tojson(x));});
{ "_id" : "barelydigital.com", "value" : { "count" : 342888 } }
{ "_id" : "barelypolitical.com", "value" : { "count" : 875217 } }
{ "_id" : "www.fastlanedaily.com", "value" : { "count" : 998360 } }
{ "_id" : "www.threadbanger.com", "value" : { "count" : 331937 } }

Nice, we have our data aggregated and the answer to our initial problem.

Automate With a Perl Script

As usual CPAN comes to the rescue with a MongoDB driver that we can use to interface with our database in a scripted fashion. The mongo guys have also done a great job of supporting it in a bunch of other languages which makes it easy to interface with if you are using more than just perl


use MongoDB;
use Data::Dumper;
use strict;
use warnings;

my $conn = MongoDB::Connection->new("host" => "localhost", "port" => 27017);
my $db = $conn->get_database("logs");

my $map = "function() { emit(this.domain_name, {count: 1}); }";

my $reduce = "function(key, values) { var sum = 0; values.forEach(function(f) { sum += f.count; }); return {count: sum}; }";

my $idx = Tie::IxHash->new(mapreduce => 'weblogs', 'map' => $map, reduce => $reduce);
my $result = $db->run_command($idx);

print Dumper($result);

my $res_coll = $result->{'result'};

print "result collection is $res_coll\n";

my $collection = $db->get_collection($res_coll);
my $cursor = $collection->query({ }, { limit => 10 });

while (my $object = $cursor->next) {
    print Dumper($object); 

The script is straightforward. Its just using the documented perl interface to make a run_command call to mongo, which passes the map and reduce javascript functions in. It then prints the results similar to how we did on the command line earlier.


We’ve gone over a very simple example of how MapReduce can work for you. There are lots of other ways you can put it to good use such as distributed sorting and searching, document clustering, and machine learning. We also took a look at MongoDB which has great uses for schema-less data. It makes it easy to scale since it has built in replication and sharding capabilities. Now you can put map/reduce to work on your logs and find a ton of information you couldn’t easily get before.



Tags: , , , , , , , , ,

Jun 11 2009

Investigating Data in Memcached

Category: Databases,Systemsjgoulah @ 9:58 PM


Almost every company I can think of uses Memcached at some layer of their stack, however I haven’t until recently found a great way to take a snapshot of the keys in memory and their associated metadata such as expire time, LRU time, the value size and whether its been expired or flushed. The tool to do this is called peep

Installing Peep

The main thing about installing peep is that you have to compile memcached with debugging symbols. Not really a big deal though. First thing you’ll want to do is install libevent

$ wget http://monkey.org/~provos/libevent-1.4.11-stable.tar.gz
$ tar xzvf libevent-1.4.11-stable.tar.gz
$ cd libevent-1.4.11-stable
$ ./configure
$ make
$ sudo make install

Now grab memcached

$ wget http://memcached.googlecode.com/files/memcached-1.2.8.tar.gz
$ tar xzvf memcached-1.2.8.tar.gz
$ cd memcached-1.2.8
$ CFLAGS='-g' ./configure --enable-threads
$ make 
$ sudo make install

Note the configure line on this one sets the -g flag which enables debug symbols. Now move this directory somewhere standard like /usr/local/src because we’ll need to reference it when installing peep

$ sudo mv memcached-1.2.8 /usr/local/src

Ok, now for peep, which is written in ruby. If you don’t have rubygems you need that and the ruby headers. Just use your package management system for this. I happened to be on a CentOS box so I ran

$ sudo yum install rubygems.noarch
$ sudo yum install ruby-devel.i386

Finally you can install peep

$ sudo gem install peep -- --with-memcached-include=/usr/local/bin/memcached-1.2.8

Using Peep

Finally you can actually use this thing, you just give it the running memcached process pid

$ sudo peep --pretty `cat /var/run/memcached/memcached.pid`

Here’s a snippet of what peep can show us in its “pretty” mode

      time |   exptime |  nbytes | nsuffix | it_f | clsid | nkey |                           key | exprd | flushd
       485 |       785 |   17925 |      10 | link |    25 |   64 |  "post.ordered-posts1-03afdb" |  true |  false
       537 |       721 |   16991 |      10 | link |    24 |   63 |  "post.ordered-posts1-03bd6"  |  true |  false
       240 |      3684 |     434 |       8 | link |     9 |   22 |  "channel.channel.105.v1"     | false |  false
       241 |      3687 |   27286 |      10 | link |    27 |   55 |  "post.post_count-fec35129"   | false |  false
       538 |      4022 |    3223 |       9 | link |    17 |   55 |  "post.post_count-2ff57a7"    | false |  false
       538 |      4024 |   17169 |      10 | link |    25 |   55 |  "post.post_count-2ba928998d" | false |  false
       241 |      3686 |   10763 |      10 | link |    22 |   55 |  "post.post_count-3879a24011" | false |  false
        25 |       320 |    8874 |       9 | link |    22 |   22 |  "channel.posterframes.4"     |  true |  false

Putting the Data Into MySQL

The above is not the easiest thing to analyze, especially if you have a lot of data in cache. But we can easily load it in to MySQL so that we can run queries on it.

First create the db and permissions

mysql> create database peep;
mysql> grant all on peep.* to peep@'localhost' identified by 'peep4u';

Then a table to store the output of the peep snapshot

mysql> CREATE TABLE `entries` (
  `lru_time` int(11) DEFAULT NULL,
  `expires_at_time` int(11) DEFAULT NULL,
  `value_size` int(11) DEFAULT NULL,
  `suffix_size` int(11) DEFAULT NULL,
  `it_flag` varchar(255) DEFAULT NULL,
  `class_id` int(11) DEFAULT NULL,
  `key_size` int(11) DEFAULT NULL,
  `key_name` varchar(255) DEFAULT NULL,
  `is_expired` varchar(255) DEFAULT NULL,
  `is_flushed` varchar(255) DEFAULT NULL

Now run peep and output the data in “ugly” format into a file

$ sudo peep --ugly `cat /var/run/memcached/memcached.pid` > peep.out

And now you can load it into your entries table

mysql> load data local infile '/tmp/peep.out' into table entries fields terminated by ' | ' lines terminated by '\n';
Query OK, 177 rows affected (0.01 sec)
Records: 177  Deleted: 0  Skipped: 0  Warnings: 0

I’m only loading a small dev install of memcahed but if you are running this on production you’d be importing many more rows. Luckily load data infile is sufficiently optimized to input large datasets. Somewhat unfortunately however, peep makes memcached block while its taking its snapshot, which can take a bit of time in production.

In any case, not that interesting with dev data but you can get lots of interesting numbers out of this. In my case it looks like most of the stuff in my cache is expired already.

mysql> select is_expired, count(*) as num, sum(value_size) as value_size from entries group by is_expired;
| is_expired | num | value_size |
| false      |   1 |        415 |
| true       | 176 |    2392792 |
2 rows in set (0.01 sec)

Or selecting by grouping slabs and displaying their sizes

mysql> select class_id as slab_class, max(value_size) as slab_size from entries group by slab_class;
| slab_class | slab_size |
|          1 |         8 |
|          2 |        15 |
|          9 |       445 |
|         12 |      1100 |
|         13 |      1402 |
|         14 |      1710 |
|         15 |      2174 |
|         16 |      2409 |
|         17 |      3223 |
|         20 |      6154 |
|         21 |      8395 |
|         22 |     10763 |
|         23 |     13395 |
|         24 |     16991 |
|         25 |     17925 |
|         26 |     23320 |
|         27 |     33361 |
|         28 |     38824 |
|         29 |     44904 |
19 rows in set (0.00 sec)


Although there are a lot of tools such as Cacti that will display graphs of your Memcached usage, there are not many tools that will actually show you specifics of whats in memory. This tool can be used for a variety of reasons, but its especially useful to example the keys that you have in memory and the size of their values, as well as if they’re expired and what the individuals slabs are holding.

Tags: , , , ,

May 17 2009

Using DBIx::Class to Version Your Schema

Category: Databases,Deployment,Schema Versioningjgoulah @ 5:53 PM

Intro to DBIx::Class

In my opinion DBIx::Class is one of the best ORM solutions out there. Not only can it model your database, including mapping out any foreign key relationships, it can also be used as a canonical point of reference for your schema.   This means we can use it not only as an application layer interface to the database, but can also define a versioned database structure using the code,  where if you add a field into your result class,  you can generate a version of alter statements and a DDL just by running a simple script.   Of course this set of alterations can later be applied to your stage and production environments.

It’s possible to produce your schema classes with DBIx::Class::Schema::Loader if you have a pre-existing database and are initially generating your DBIC classes, but this article is going to show going from the ground up how to build your database from the code defining the fields and tables as well as the relations, which match to foreign key constraints when the database definition files are generated.

Creating your Schema

This example will be a pretty simple schema but enough to demonstrate generating a couple of tables with a foreign key constraint. Since a lot of people are doing social networking stuff these days we can assume a common use case would be a user who has a bunch of photos. So we’ll need to model a user table and a photo table, but first things first, lets create a few basic things, assuming we’ll call our application MySocialApp

mkdir -p MySocialApp/lib/MySocialApp/Schema/Result
cd MySocialApp

We need to create a schema file at lib/MySocialApp/Schema.pm that inherits from the base DBIx::Class::Schema

package MySocialApp::Schema;
use base qw/DBIx::Class::Schema/;

use strict;
use warnings;
our $VERSION = '0.00001';





So here all we are doing is extending the base Schema and loading the Versioned component. We’re setting the directory where DDL and Diff files will be generated into the sql directory. We’re also invoking load_namespaces which tells it to look at the MySocialApp::Schema::Result namespace for our result classes by default.

Creating the Result Classes

Here I will define what the database is going to look like

package MySocialApp::Schema::Result::User;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';



    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,


    { "foreign.fk_user_id" => "self.user_id" },


This is relatively straightforward. I’m creating a user table that has an auto incrementing primary key and a username field. I’m also defining a relationship that says a user can have many photos. Lets create the photo class.

package MySocialApp::Schema::Result::Photo;

use Moose;
use namespace::clean -except => 'meta';

extends 'DBIx::Class';



    photo_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    url => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    fk_user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        extra => { unsigned => 1 },


    { 'foreign.user_id' => 'self.fk_user_id' },


Same basic thing here, a photo class with a photo_id primary key, a url field, and an fk_user_id field that keys into the user table. Each photo belongs to a user, and this relationship will define our foreign key constraint when the schema is generated.

Versioning the Database

Create the Versioning Scripts

We have the main DBIx::Class pieces in place to generate the database, but we’ll need a couple of scripts to support our versioned database. One script will generate the schema based on the version before it, introspecting which alterations have been made and producing a SQL diff file to alter the database. The other script will look at the database to see if it needs upgrading, and run the appropriate diff files to bring it up to the current version.

First the schema and diff generation script which we’ll call script/gen_schema.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use Pod::Usage;
use Getopt::Long;
use MySocialApp::Schema;

my ( $preversion, $help );
        'p|preversion:s'  => \$preversion,
        ) or die pod2usage;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'
my $version = $schema->schema_version();

if ($version && $preversion) {
    print "creating diff between version $version and $preversion\n";
} elsif ($version && !$preversion) {
    print "creating full dump for version $version\n";
} elsif (!$version) {
    print "creating unversioned full dump\n";

my $sql_dir = './sql';
$schema->create_ddl_dir( 'MySQL', $version, $sql_dir, $preversion );

This script will be run anytime we change something in the Result files. You give it the previous schema version, and it will create a diff between that and the new version. Before running this you’ll update the $VERSION variable in lib/MySocialApp/Schema.pm so that it knows a change has been made.

The next script is the upgrade script, we’ll call it script/upgrade_db.pl

use strict;

use FindBin;
use lib "$FindBin::Bin/../lib";

use MySocialApp::Schema;

my $schema = MySocialApp::Schema->connect(
        'dbi:mysql:dbname=mysocialapp;host=localhost', 'mysocialapp', 'mysocialapp4u'

if (!$schema->get_db_version()) {
    # schema is unversioned
} else {

This script checks to see if any diffs need to be applied, and applies them if the version held by the database and the version in your Schema.pm file differ, bringing the database up to the correct schema version.

Note, in these scripts I’ve hardcoded the DB info which really should go into a configuration file.

Create a Database to Deploy Into

We need to create the database that our tables will be created in. In the connect calls above we’re using this user and password to connect to our database. I’m using MySQL for the example so this would be done on the MySQL command prompt

mysql> create database mysocialapp;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on mysocialapp.* to mysocialapp@'localhost' identified by 'mysocialapp4u';
Query OK, 0 rows affected (0.01 sec)

Deploy the Initial Schema

Now its time to deploy our initial schema into MySQL. But for the first go round we also have to create the initial DDL file, this way when we make changes in the future it can be compared against the Schema result classes to see what changes have been made. We can do this by supplying a nonexistent previous version to our gen_schema.pl script

$ perl script/gen_schema.pl -p 0.00000
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.
creating diff between version 0.00001 and 0.00000
No previous schema file found (sql/MySocialApp-Schema-0.00000-MySQL.sql) at /home/jgoulah/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1685.

And we can see the DDL file now exists

$ ls sql/

Then we need to deploy to MySQL for the first time so we run the upgrade script

$ perl script/upgrade_db.pl
Your DB is currently unversioned. Please call upgrade on your schema to sync the DB.

Now check out what its created in MySQL

mysql> use mysocialapp;
Database changed
mysql> show tables;
| Tables_in_mysocialapp      |
| dbix_class_schema_versions |
| photo                      |
| user                       |
3 rows in set (0.00 sec)

There is our photo table, our user table, and also a dbix_class_schema_versions table. This last table just keeps track of what version the database is. You can see we are in sync with the Schema class by selecting from that table and also when this version was installed.

mysql> select * from dbix_class_schema_versions;
| version | installed           |
| 0.00001 | 2009-05-17 21:59:57 |
1 row in set (0.00 sec)

The really great thing is that we have created the tables -and- constraints based on our schema result classes. Check out our photo table

mysql> show create table photo\G
*************************** 1. row ***************************
       Table: photo
Create Table: CREATE TABLE `photo` (
  `photo_id` int(11) unsigned NOT NULL auto_increment,
  `url` varchar(255) NOT NULL,
  `fk_user_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`photo_id`),
  KEY `photo_idx_fk_user_id` (`fk_user_id`),
  CONSTRAINT `photo_fk_fk_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
1 row in set (0.00 sec)

Making Database Changes

Lets say we want to add a password field to our user table. I’d open up the lib/MySocialApp/Schema/Result/User.pm file and add a section for my password field to the add_columns definition so now it looks like:


    user_id => {
        data_type   => "INT",
        size        => 11,
        is_nullable => 0,
        is_auto_increment => 1,
        extra => { unsigned => 1 },
    username => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,
    password => {
        data_type => "VARCHAR",
        is_nullable => 0,
        size => 255,

Then we update the lib/MySocialApp/Schema.pm file and update to the next version so it looks like

our $VERSION = '0.00002';

To create the DDL and Diff for this version we run the gen_schema script with the previous version as the argument

$ perl script/gen_schema.pl -p 0.00001
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.
creating diff between version 0.00002 and 0.00001

If you look in the sql directory there are two new files. The DDL is named MySocialApp-Schema-0.00002-MySQL.sql and the diff is called MySocialApp-Schema-0.00001-0.00002-MySQL.sql and has the alter statement

$ cat sql/MySocialApp-Schema-0.00001-0.00002-MySQL.sql
-- Convert schema 'sql/MySocialApp-Schema-0.00001-MySQL.sql' to 'MySocialApp::Schema v0.00002':;




Now we can apply that with our upgrade script

$ perl script/upgrade_db.pl
Versions out of sync. This is 0.00002, your database contains version 0.00001, please call upgrade on your Schema.

DB version (0.00001) is lower than the schema version (0.00002). Attempting upgrade.

and see that the upgrade has been made in MySQL

mysql> describe user;
| Field    | Type             | Null | Key | Default | Extra          |
| user_id  | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| username | varchar(255)     | NO   |     |         |                |
| password | varchar(255)     | NO   |     |         |                |
3 rows in set (0.00 sec)


So now we’ve seen a really easy way that we can maintain our database schema from our ORM code. We have a versioned database that is under revision control, and can keep our stage and production environments in sync with our development code. With this type of setup its also easy to maintain branches with different database changes and merge them into the mainline, and it also ensures that the database you are developing on is always in sync with the code.

Tags: , , ,

Jan 25 2009

Quick DB Setups with MySQL Sandbox

Category: Databasesjgoulah @ 2:57 PM


There are various reasons to setup quick “sandbox” instances of MySQL. You can use them to test different types of replication (such as master-master or various slave topologies), to test your code against different versions of MySQL, or to setup instances of MySQL on a per developer basis where each person has their own database running on a different port so they can breakdown/setup the DB easily or make schema changes without affecting other team members. A perfect tool to do all of these things easily is MySQL Sandbox.

Download the Prerequisites

To use MySQL sandbox effectively you need two things, the MySQL sandbox tool itself, and a MySQL tarball that the sandbox script can use to setup instances.

You can download the latest MySQL sandbox version like so (2.0.12 as of this writing):

wget http://launchpad.net/mysql-sandbox/mysql-sandbox-2.0/2.0/+download/mysql_sandbox_2.0.12.tar.gz

You can download the MySQL tarball from any MySQL mirror. Its important to get the non RPM, Intel C/C++ compiled, glibc-2.3 version (look for a tar.gz file with “icc” in the filename) for example if you want version 5.1:

wget ftp://mirror.anl.gov/pub/mysql/Downloads/MySQL-5.1/mysql-5.1.30-linux-i686-icc-glibc23.tar.gz

Installing an Instance with MySQL Sandbox

First you need to extract the MySQL Sandbox tool and change directory to it:

$ tar xzvf mysql_sandbox_2.0.12.tar.gz
$ cd mysql_sandbox_2.0.12

The easiest and quickest way to create an instance is:

$ ./make_sandbox /path/to/mysql-X.X.XX-osinfo.tar.gz

where mysql-X.X.XX-osinfo.tar.gz is the MySQL tarball we just downloaded. And you’re done.

However, this will put the sandbox in a directory under your home directory ($HOME/sandboxes/msb_X_X_XX), which may or may not suit your purposes. It sets it up with default users, passwords, ports, and directory name. Lets fine tune things a bit.

Setting up a Custom Tuned Instance

I want to put my instance into a partition I created called /mnt/mysql_sandboxes. I’ve created a subdirectory in there called tarballs, which holds the MySQL tarball that we downloaded above which MySQL Sandbox will extract for setup. Since I’m installing version 5.1.30 I want to call the directory that houses the MySQL data files 5.1.30_single, but you can call it anything you like. I’ll create a default user named jgoulah and a password goulah. By default it sets the port to the version number without the dots (5130 in this case) so we’ll give it a custom port so that it listens on 10000 instead.

mysql_sandbox_2.0.12 $  ./make_sandbox \
/mnt/mysql_sandboxes/tarballs/mysql-5.1.30-linux-i686-icc-glibc23.tar.gz \
--upper_directory=/mnt/mysql_sandboxes/ --sandbox_directory=5.1.30_single \
--db_user=jgoulah --db_password=goulah --sandbox_port=10000

Here’s the output:

unpacking /mnt/mysql_sandboxes/tarballs/mysql-5.1.30-linux-i686-icc-glibc23.tar.gz
Executing ./low_level_make_sandbox \
        --basedir=/mnt/mysql_sandboxes/tarballs/5.1.30 \
        --sandbox_directory=msb_5_1_30 \
        --install_version=5.1 \
        --sandbox_port=5130 \
        --no_ver_after_name \
        --upper_directory=/mnt/mysql_sandboxes/ \
        --sandbox_directory=5.1.30_single \
        --db_user=jgoulah \
        --db_password=goulah \
        --basedir=/mnt/mysql_sandboxes/tarballs/5.1.30 \
        --sandbox_port=10000 \
    The MySQL Sandbox,  version 2.0.12 16-Oct-2008
    (C) 2006,2007,2008 Giuseppe Maxia, Sun Microsystems, Database Group
installing with the following parameters:
upper_directory                = /mnt/mysql_sandboxes/
sandbox_directory              = 5.1.30_single
sandbox_port                   = 10000
datadir_from                   = script
install_version                = 5.1
basedir                        = /mnt/mysql_sandboxes/tarballs/5.1.30
my_file                        =
operating_system_user          = jgoulah
db_user                        = jgoulah
db_password                    = goulah
my_clause                      = log-error=msandbox.err
prompt_prefix                  = mysql
prompt_body                    =  [\h] {\u} (\d) > '
force                          = 0
no_ver_after_name              = 1
verbose                        = 0
load_grants                    = 1
no_load_grants                 = 0
do you agree? ([Y],n) y
loading grants
. sandbox server started
installation options saved to current_options.conf.
To repeat this installation with the same options,
use ./low_level_make_sandbox --conf_file=current_options.conf
Your sandbox server was installed in /mnt/mysql_sandboxes//5.1.30_single

Its now installed and started up, we can see that the process is running with the correct options:

$  ps -ef | grep mysql | grep jgoulah
jgoulah  11128     1  0 13:48 pts/3    00:00:00 /bin/sh /mnt/mysql_sandboxes/tarballs/5.1.30/bin/mysqld_safe --defaults-file=/mnt/mysql_sandboxes//5.1.30_single/my.sandbox.cnf
jgoulah  11203 11128  0 13:48 pts/3    00:00:00 /mnt/mysql_sandboxes/tarballs/5.1.30/bin/mysqld --defaults-file=/mnt/mysql_sandboxes//5.1.30_single/my.sandbox.cnf --basedir=/mnt/mysql_sandboxes/tarballs/5.1.30 --datadir=/mnt/mysql_sandboxes//5.1.30_single/data --user=jgoulah --log-error=/mnt/mysql_sandboxes//5.1.30_single/data/msandbox.err --pid-file=/mnt/mysql_sandboxes//5.1.30_single/data/mysql_sandbox10000.pid --socket=/tmp/mysql_sandbox10000.sock --port=10000

And we can connect to it on the port 10000:

$  mysql -u jgoulah --protocol=TCP  -P 10000 -pgoulah

You can also go into the directory where we’ve installed this and there are some convenience scripts:

$ cd /mnt/mysql_sandboxes/5.1.30_single/

You can run the use script to connect into mysql (same thing we just did above except we don’t have to remember our port, user, or pass):

$ ./use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.1.30 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql [localhost] {jgoulah} ((none)) >

Stop the instance:

$  ./stop

Or start it back up again:

$  ./start
. sandbox server started

There are a few other scripts which you can experiment with in this directory which are documented here

Setting up a Replicated Instance

The nice thing about this tool is it will also setup replicated instances of MySQL with a single command. This allows you to test your application under a replicated environment, or even test different replication topologies including multiple slaves or multi-master replication.

We’ll use similar options as the single instance above, except we’ll use port 11000 this time (the slaves get port + 1, …, port + n where n is number of slaves). We’ll put the install into /mnt/mysql_sandboxes/5.1.30_replicated. Note this time we use the make_replication_sandbox script:

mysql_sandbox_2.0.12 $  ./make_replication_sandbox \
/mnt/mysql_sandboxes/tarballs/mysql-5.1.30-linux-i686-icc-glibc23.tar.gz \
--upper_directory=/mnt/mysql_sandboxes/ \
--replication_directory=5.1.30_replicated --sandbox_base_port=11000
installing and starting master
installing slave 1
installing slave 2
starting slave 1
. sandbox server started
starting slave 2
. sandbox server started
initializing slave 1
initializing slave 2
replication directory installed on /mnt/mysql_sandboxes//5.1.30_replicated

Now we have a master and two slaves going. Note that the command to setup replicated sandboxes will not let us specify the user as we did with the single instance, but two users are created by default:

User: root@localhost Password: msandbox
User: msandbox@% Password: msandbox

You can run the use script as shown above, or connect directly to the master:

$  mysql -u msandbox --protocol=TCP  -P 11000 -pmsandbox

Create a database:

mysql> create database jg_repl_test;
mysql> exit;

Connect to one of the slaves:

$  mysql -u msandbox --protocol=TCP  -P 11001 -pmsandbox
mysql> show databases like '%jg_%';
| Database (%jg_%) |
| jg_repl_test     |

And we can see that it has replicated across.

There are various options you can give the make_replication_sandbox command, for example you can give it the –master_master option to setup a multi master instance.


We’ve seen how to create a single instance of MySQL and also a replicated master with two slaves. Each of these takes only a few seconds to setup once you have the directory layout decided. There really isn’t a much easier way to setup scratch instances of MySQL for all kinds of different purposes.


Jan 22 2009

MySQL Snapshots using LVM

Category: Databasesjgoulah @ 2:50 PM


LVM snapshots are a quick and easy way to take a backup of your MySQL server’s data files. The idea is to acquire a read lock on all tables, flush all server caches to disk, and create a snapshot of the volume containing the MySQL data directory, then unlock the tables again. This gives an atomic snapshot that takes only milliseconds. A mysql instance can then be launched to point at this data directory and you can perform a mysqldump to get data for setting up slaves, or you can take a completely destroyed DB and use the data directory to restore it to this point in time.

Setup the LVM

Figure out the partition you want to put the LVM on, in our case we’ll use /data2

$ df -h | grep data2
/dev/sda2             371G  195M  352G   1% /data2

We’ll need to run fdisk on it.  If you have data on here, save it to somewhere else!

$ sudo fdisk /dev/sda2
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 49932.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
Partition number (1-4): 1
First cylinder (1-49932, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-49932, default 49932):
Using default value 49932

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

You can review the change with the ‘p’ command:

Command (m for help): p

Disk /dev/sda2: 410.7 GB, 410704680960 bytes
255 heads, 63 sectors/track, 49932 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

     Device Boot      Start         End      Blocks   Id  System
/dev/sda2p1               1       49932   401078758+  8e  Linux LVM

You might need to reboot here, so go ahead and run sudo shutdown -r now

Now unmount the disk we want to create the lvm on and create a physical volume

$ sudo umount /dev/sda2
$ sudo pvcreate /dev/sda2
Physical volume "/dev/sda2" successfully created

You can see our new physical volume with pvdisplay

$ sudo pvdisplay
  "/dev/sda2" is a new physical volume of "382.50 GB"
  --- NEW Physical volume ---
  PV Name               /dev/sda2
  VG Name
  PV Size               382.50 GB
  Allocatable           NO
  PE Size (KByte)       0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               myxfgQ-alfA-y8sX-af4a-ifoA-3S29-YD3tuK

Now create a volume group on the physical volume, since we’re using this for mysql we’ll call it mysql

 $ sudo vgcreate mysql /dev/sda2
  Volume group "mysql" successfully created

And you can view it

$ sudo vgdisplay
  --- Volume group ---
  VG Name               mysql
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               382.50 GB
  PE Size               4.00 MB
  Total PE              97919
  Alloc PE / Size       0 / 0
  Free  PE / Size       97919 / 382.50 GB
  VG UUID               0VZxkR-873N-E6Vj-H6SV-FCWO-3jHF-ykHRiX

Create a logical volume on the volume group, we’ll put it on our mysql volume group and call it maindb. We want it to be about half (or less) of the entire amount of space we have to make a backup, in this case thats about half of 382GB so we’ll go with 190:

$ sudo lvcreate --name maindb --size 190G mysql
  Logical volume "maindb" created

And you can view it:

$ sudo lvdisplay
  --- Logical volume ---
  LV Name                /dev/mysql/maindb
  VG Name                mysql
  LV UUID                4hlcMf-cs5a-xjYU-yxjE-Fx2f-Pngv-z0RAEH
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                190.00 GB
  Current LE             48640
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

Now we have to put an actual filesystem on it:

$ sudo mkfs.ext3 /dev/mysql/maindb
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
24903680 inodes, 49807360 blocks
2490368 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1520 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

And now we can actually re mount it on the /data2 folder:

$ sudo mount /dev/mysql/maindb /data2

And add it to fstab, making sure to comment out the old entry

#LABEL=/data2            /data2                  ext3    defaults        1 2
/dev/mysql/maindb       /data2                  ext3       rw,noatime   0 0

Note that we do not have to create the other logical volume on the volume group that is used for the snapshot, as the lvmbackup tool will take care of it for us.

Move MySQL to the LVM Partition

Now we want to move our mysql installation into this folder:

$ sudo mkdir -p /data2/var/lib
$ sudo mv /var/lib/mysql/ /data2/var/lib/

We need to ensure the my.cnf has settings pointing to this new directory:

log-bin = /data2/var/lib/mysql/bin.log



We can keep the log and pid outside of the LVM if we want:


Now you can start mysqld:

sudo /usr/bin/mysqld_safe &

Install mylvmbackup

Grab the latest mylvmbackup utility:

$ wget http://www.lenzg.org/mylvmbackup/mylvmbackup-current.tar.gz

Extract and install, its just a script so there is no compilation necessary:

$ tar xzvf mylvmbackup-0.11.tar.gz
$ cd mylvmbackup-0.11
$ sudo make install

You’ll also need some perl modules:

Now we need to setup the config file for mylvmbackup, located at /etc/mylvmbackup.conf

There are lots of options but at the very least we need to set:

password={your root pass here}





You may also want to change the backupdir parameter, as this is where the tarball of your database backup is stored,  so you’ll need the space for that.

You’ll want to make sure this backup directory exists, so if you are using the default:

$ sudo mkdir -p /var/tmp/mylvmbackup/backup

And also the mount directory (this is where the LVM snapshot is mounted so a mysqldump can be taken:

$ sudo mkdir -p /var/tmp/mylvmbackup/mnt

Create the Backup

Now you can simply run the mylvmbackup command as root.   Heres what the output looks like:

$ sudo mylvmbackup
20090122 13:26:47 Info: Connecting to database...
20090122 13:26:47 Info: Flushing tables with read lock...
20090122 13:26:47 Info: Taking position record...
20090122 13:26:47 Info: Running: /usr/sbin/lvcreate -s --size=190G 
--name=maindb_snapshot /dev/mysql/maindb
File descriptor 3 (socket:[101615]) leaked on lvcreate invocation.
 Parent PID 7209: /usr/bin/perl
  Logical volume "maindb_snapshot" created
20090122 13:26:47 Info: DONE: taking LVM snapshot
20090122 13:26:47 Info: Unlocking tables...
20090122 13:26:47 Info: Disconnecting from database...
20090122 13:26:47 Info: Mounting snapshot...
20090122 13:26:47 Info: Running: /bin/mount -o rw /dev/mysql/maindb_snapshot 
20090122 13:26:47 Info: DONE: mount snapshot
20090122 13:26:47 Info: Running: /bin/mount -o bind,ro 
20090122 13:26:47 Info: DONE: bind-mount position directory
20090122 13:26:47 Info: Recovering InnoDB...
20090122 13:26:47 Info: Running: echo 'select 1;' | mysqld_safe 
--skip-grant --bootstrap --skip-ndbcluster --skip-slave-start
Starting mysqld daemon with databases from /var/tmp/mylvmbackup/mnt/backup
STOPPING server from pid file /var/tmp/mylvmbackup_recoverserver.pid
090122 13:26:47  mysqld ended
20090122 13:26:47 Info: DONE: InnoDB recovery on snapshot
20090122 13:26:47 Info: Copying my.cnf...
20090122 13:26:47 Info: Taking actual backup...
20090122 13:26:47 Info: Creating tar archive 
20090122 13:26:47 Info: Running: cd '/var/tmp/mylvmbackup/mnt' ;
'/bin/tar' cvf - backup/  
 /bin/gzip --stdout --verbose 
--best ->
/bin/tar: backup/var/lib/mysql/mysql.sock: socket ignored
20090122 13:26:48 Info: DONE: create tar archive
20090122 13:26:48 Info: Cleaning up...
20090122 13:26:48 Info: LVM Usage stats:
20090122 13:26:48 Info:LV VG Attr LSize Origin Snap% Move Log Copy% Convert
20090122 13:26:48 Info:   maindb_snapshot mysql swi-a- 190.00G maindb   0.00
  Logical volume "maindb_snapshot" successfully removed

And now you should have a tarball in your backup directory, with all the files needed to restore:

$ ls /var/tmp/mylvmbackup/backup