Slides from my talk at Percona 2013
Apr 23 2013
Slides from my talk at Percona 2013
Mar 25 2013
One of my favorite things to do (other than stare at a computer all day) is to cook. I recently bought a new smoker grill, and quickly realized that I was running outside the entire day to tune the vents to keep the temperature in the range I wanted for a 12 hour slow cook. I did a little bit of research and turns out there are some decent devices that provide a fan to keep the temperature consistent. Even better, there was one equipped with a web server that can be accessed over wi-fi, called the CyberQ.
Its not cheap, but life is short, and it will maintain the temperature you set up to 475 degrees Fahrenheit. I already have Ganglia setup in my house for basic server monitoring (you run linux servers in your house too right?) and since the device connects to the home network I figured it would be easy enough to query and graph. Luckily the people that made this device were nice enough to provide an XML endpoint that is easy to parse for all the information you want. There are a few different endpoints but I request the config.xml version since it gives you the most information.
So I wrote a gmetric to query the box and throw the data into a few ganglia graphs. As of this writing I graph the fan speed, the pit temp, and the three food temps it can provide. There is a bunch of other data returned, but this was most important for me initially.
While this device would normally be used for low and slow cooks, I did a faster high temp grill cook just to try it out. I set the temp to the max of 475 and you can see it took a bit of time to bring the grill up to that temp, and the fan output to get it there. Once it got there, you can see it drop when I opened up the lid to put the food on. The rest of the fluctuation is due to opening the lid and flipping the food:
and then here is the food, two steaks cook rather quickly at that high of a temperature:
Jan 21 2013
There are a lot of private cloud solutions out there with great things built into them already to complete a full cloud stack – networking, dashboards, storage, and a framework that puts all the pieces together, amongst other things. But there is also a decent amount of overhead to getting these frameworks setup, and maybe you want more flexibility over some of the components, or even just something a little more homegrown. What might a lightweight cloud machine bootstrapping process look like if it where implemented from scratch?
We can use libvirt and KVM/QEMU to put something reasonably robust together, start by installing those packages:
apt-get install qemu-kvm libvirt libvirt-bin virtinst virt-viewer
The next important thing is to setup a bridge for proper networking on this host. This will allow the guests to use the bridge to communicate on the same network. There should be a few articles out there that can help you set this up, but the basics are that you want your bridge assigned the IP that your eth0 interface previously had, and then add the eth0 interface to the bridge. In this example 192.168.1.101 is the IP of the host machine:
# cat /etc/network/interfaces auto lo iface lo inet loopback iface eth0 inet manual auto br0 iface br0 inet static address 192.168.1.101 netmask 255.255.255.0 gateway 192.168.1.1 network 192.168.1.0 bridge_ports eth0 ifup br0
The first step is setting up a base template that you create your instances from. So grab an iso to start from, we’ll use debian, but this process works with any distro:
% wget http://cdimage.debian.org/debian-cd/6.0.6/amd64/iso-cd/debian-6.0.6-amd64-netinst.iso
And allocate a file on disk to the size you’d like your template to be. I created one here at 8GB, it can always be expanded later, so this should only need to be big enough to hold the initial base image that all instances will start from. Generally smaller is better because of the copy step when instances get created later.
% dd if=/dev/zero of=/var/lib/libvirt/images/debbase.img bs=1M count=8192
Now you can start the linux installation, noting the –graphics args for the ability to connect with VNC. Our installation target disk is the one we created above, debbase.img, and we are giving it 512M RAM and 1 CPU.
% virt-install --name=virt-base-deb --ram=512 --graphics vnc,listen=0.0.0.0 --network=bridge=br0 \ --accelerate --virt-type=kvm --vcpus=1 --cpuset=auto --cpu=host --disk /var/lib/libvirt/images/debbase.img \ --cdrom debian-6.0.6-amd64-netinst.iso
Once thats started up you can use VNC on your client machine to connect to this instance graphically and run through the normal install setup. There are plenty of clients out there but a decent one is Chicken of the VNC. Its also possible at this step that you’d create the image off a PXE boot or similar bootstrapping mechanism.
Here we take advantage of QEMU ability to load Linux kernels and init ramdisks directly, thereby circumventing bootloaders such as GRUB. It then can be launched with the physical partition containing the root filesystem as the virtual disk.
There are two steps to make this work. First you’ll need the vmlinuz and initrd files, and the easiest way to get those is to copy them from the base image we setup above:
% scp BASEIP:/boot/vmlinuz-2.6.32-5-amd64 /var/lib/libvirt/kernels/ % scp BASEIP:/boot/initrd.img-2.6.32-5-amd64 /var/lib/libvirt/kernels/
The next step is to extract the root partition from that same base image. We want to take a look at how those partitions are laid out so that we can get the right numbers to pass to the dd command.
% sfdisk -l -uS /var/lib/libvirt/images/debbase.img Disk /var/lib/libvirt/images/debbase.img: 1044 cylinders, 255 heads, 63 sectors/track Warning: extended partition does not start at a cylinder boundary. DOS and Linux will interpret the contents differently. Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /var/lib/libvirt/images/debbase.img1 * 2048 15988735 15986688 83 Linux /var/lib/libvirt/images/debbase.img2 15990782 16775167 784386 5 Extended /var/lib/libvirt/images/debbase.img3 0 - 0 0 Empty /var/lib/libvirt/images/debbase.img4 0 - 0 0 Empty /var/lib/libvirt/images/debbase.img5 15990784 16775167 784384 82 Linux swap / Solaris
We are going to pull the first partition out, note how the numbers line up to the first line corresponding to debbase.img1 line. We start at sector 2048 and get 15986688 sectors of 512 bytes each:
% dd if=/var/lib/libvirt/images/debbase.img of=/var/lib/libvirt/debian-tmpl skip=2048 count=15986688 bs=512
Now we have a disk file that serves as our image template. There’s a few things we want to change directly on this template. Note that we are using a few all caps placeholders ending in -TMPL that we’ll replace later with sed. We can edit the templates files by mounting the disk:
% mkdir -p /tmp/newtmpl % mount -t ext3 -o loop /var/lib/libvirt/debian-tmpl /tmp/newtmpl % chroot /tmp/newtmpl
Note at this point we are chrooted and these commands are acting against our template disk file.
Clear out the old IPs tied to our NIC when the base image networking was setup:
% echo "" > /etc/udev/rules.d/70-persistent-net.rules
We’re going to put a placeholder for our hostname in /etc/hostname:
% echo "HOSTNAME-TMPL" > /etc/hostname
Set a nameserver template in /etc/resolv.conf:
% echo "nameserver NAMESERVER-TMPL" > /etc/resolv.conf
In the file /etc/network/interfaces:
# The loopback network interface auto lo iface lo inet loopback auto eth0 iface eth0 inet static address ADDRESS-TMPL netmask NETMASK-TMPL gateway GATEWAY-TMPL
This will give us console access when we boot it. Make sure /etc/inittab has this line (usually just uncomment it):
T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100
Now we have all the pieces together to launch an instance from our image. This script will create the instance given the IP and hostname. It does no error checking for readability reasons, and is well commented so that you know whats going on:
#!/bin/bash # read in '' from command line virt_ip=$1 virt_host=$2 # build the fqdn based off the short host name virt_fqdn=${virt_host}.linux.bogus # fill in your network defaults virt_gateway=192.168.1.1 virt_netmask=255.255.225.0 virt_nameserver=192.168.1.101 # how the disk/ram/cpu is sized virt_disk=10G virt_ram=512 virt_cpus=1 # random mac address virt_mac=$(openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/.$//') cp /var/lib/libvirt/images/debian-tmpl /var/lib/libvirt/images/${virt_host}-disk0 # optionally resize the disk qemu-img resize /var/lib/libvirt/images/${virt_host}-disk0 ${virt_disk} loopback=`losetup -f --show /var/lib/libvirt/images/${virt_host}-disk0` fsck.ext3 -fy $loopback resize2fs $loopback ${virt_disk} losetup -d $loopback mountbase=/tmp/${virt_host} mkdir -p ${mountbase} mount -o loop /var/lib/libvirt/images/${virt_host}-disk0 ${mountbase} # replace our template vars sed -i -e "s/ADDRESS-TMPL/$virt_ip/g" \ -e "s/NETMASK-TMPL/$virt_netmask/g" \ -e "s/GATEWAY-TMPL/$virt_gateway/g" \ -e "s/HOSTNAME-TMPL/$virt_fqdn/g" \ -e "s/NAMESERVER-TMPL/$virt_nameserver/g" \ ${mountbase}/etc/network/interfaces \ ${mountbase}/etc/resolv.conf \ ${mountbase}/etc/hostname # unmount and remove the tmp files umount /tmp/${virt_host} rm -rf /tmp/${virt_host}* # run a file system check on the disk fsck.ext3 -pv /var/lib/libvirt/images/${virt_host}-disk0 # specify the kernel and initrd (these we copied with scp earlier) vmlinuz=/var/lib/libvirt/kernels/vmlinuz-2.6.32-5-amd64 initrd=/var/lib/libvirt/kernels/initrd.img-2.6.32-5-amd64 # install the new domain with our specified parameters for cpu/disk/memory/network virt-install --name=$virt_host --ram=$virt_ram \ --disk=path=/var/lib/libvirt/images/${virt_host}-disk0,bus=virtio,cache=none \ --network=bridge=br0 --import --accelerate --vcpus=$virt_cpus --cpuset=auto --mac=${virt_mac} --noreboot --graphics=vnc \ --cpu=host --boot=kernel=$vmlinuz,initrd=$initrd,kernel_args="root=/dev/vda console=ttyS0 _device=eth0 \ _ip=${virt_ip} _hostname=${virt_fqdn} _gateway=${virt_gateway} _dns1=${virt_nameserver} _netmask=${virt_netmask}" # start it up virsh start $virt_host
assuming we named it buildserver, run the above like:
% buildserver 192.168.1.197 jgoulah
This is really just the first step, but now that you can bring a templated disk up you can decide a little more about how you’d like networking to work for your cloud. You can either continue to use static IP assignment as shown here, and use nsupdate to insert dns entries when new guests come up, or you can set things up such that the base image uses dhcp, and you can configure your dhcp server to update records in dns when clients come online. You may also want to bake your favorite config management system into the template so that you can bootstrap the nodes and maintain configurations on them. Have fun!
Jan 01 2013
I’d been using gnome vino as a VNC server for years on my media computer. This way I can use touchpad to control it from my iPad. It works fine, but a little bit clunky and badly documented, plus its tied directly to gnome. The other day I woke up to a filled drive (~2TB) of vino errors. I killed it off and cleaned up the error log file and tried to start it up again. No go this time due to some startup errors. This isn’t the first time I’ve fought with it, surely something better exists.
This led me to X11vnc. A bit of fresh air its fairly easy to setup and get going very quickly. I’ll first show how to do it manually and then present my open sourced chef cookbook.
First thing is to install it:
sudo apt-get install x11vnc
Setup a password file:
sudo x11vnc -storepasswd YOUR_PASS_HERE /etc/x11vnc.pass
Create an upstart config:
sudo touch /etc/init/x11vnc.conf
Open it and put this into it:
start on login-session-start script x11vnc -xkb -noxrecord -noxfixes -noxdamage -display :0 -rfbauth /etc/x11vnc.pass \ -auth /var/run/lightdm/root/:0 -forever -bg -o /var/log/x11vnc.log end script
and start it up:
sudo service x11vnc start
Simple, easy, and just works.
Naturally, I ported this to a chef cookbook which you can find here. I have to admit I’m not totally happy with it yet, mainly because it doesn’t restart x11vnc on config changes. In some cases such as our production servers, we actually prefer this so that we don’t accidentally roll out a broken config change and have an auto-restart bring everything to its knees (there are some ways to avoid this but thats another post). In any case on my home computer I prefer it to restart if I make changes, but I’m struggling to get upstart to stop the process correctly due to what seems to be some disassociation with its pid file. The other thing is the recipe currently assumes you are using Ubuntu or something similar, but can easily be extended. Hope this helps someone else out there!
Dec 27 2012
The other day we were getting some messages on our network switch that a host was flapping between ports. It turns out we had two virtual hosts on different machines using the same MAC address (not good). We had the interface and MAC information, and wanted to find what vm domains mapped to these. Its easy if you know the domain name of the vm node, you can get back the associated information like so:
% sudo virsh domiflist goulah Interface Type Source Model MAC ------------------------------------------------------- vnet2 bridge br0 - 52:54:00:19:40:18
But if you only have the virtual interface, how would you get the domain? I remembered virsh dumpxml will show all of the dynamic properties about the VM, including those which are not available in the static XML definition of the VM in /etc/libvirt/qemu/foo.xml. Which vnetX interface is attached to which VM is one of these additional dynamic properties. So we can grep for this! I concocted up a simple (not very elegant) function which given the vnet interface will return the domain associated to it:
function find-vnet() { for vm in $(sudo virsh list | grep running | awk '{print $2}'); do sudo virsh dumpxml $vm|grep -q "$1" && echo $vm; done }
It just looks through all the running domains and prints the domain name if it finds the interface you’re looking for. So now you can do:
% find-vnet vnet2 goulah
I wonder if others out there have a clever way of doing this or if this is really the best way? If you know of a better way leave a comment. Perhaps the problem is not common enough that a libvirt built-in command exists.
Oct 27 2012
StatsD is a network daemon that runs on the Node.js platform and listens for statistics, like counters and timers. Packets are then sent to one or more pluggable backend services. The default service is Graphite. Every 10 seconds the stats sent to StatsD are aggregated and forwarded on to this backend service. It can be useful to see what stats are going through both sides of the connection – from the client to StatsD and then from StatsD to Graphite.
The first thing to know is there is a simple management interface built in that you can interact with. By using either telnet or netcat you can find information directly from the command line. By default this is listening on port 8126, but that is configurable in StatsD.
The simplest thing to do is send the stats command:
% echo "stats" | nc statsd.domain.com 8126 uptime: 365 messages.last_msg_seen: 0 messages.bad_lines_seen: 0 graphite.last_flush: 5 graphite.last_exception: 365
This tells us a bit about the current state of the server, including the uptime, and the last time a flush was sent to the backend. Our server has only been running for 365 seconds. It also lets us know when the length of time since StatsD received its last message, bad lines sent to it, and the last exception. Things look pretty normal.
You can also get a dump of the current timers:
(echo "timers" | nc statsd.domain.com 8126) > timers
As well as a dump of the current counters:
(echo "counters" | nc statsd.domain.com 8126) > counters
Take a look at the files generated to get an idea of the metrics StatsD is currently holding.
Beyond that, its fairly simple to debug certain StatsD or Graphite issues by looking at whats going on in realtime on the connection itself. On the StatsD host, be sure you’re looking at traffic across the default StatsD listen port (8125), and specifically here I’m grep’ing for the stat that I’m about to send which will be called test.goulah.myservice:
% sudo tcpdump -t -A -s0 dst port 8125 | grep goulah listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
Then we fake a simple client on the command line to send a sample statistic to StatsD like so:
echo "test.goulah.myservice:1|c" | nc -w 1 -u statsd.domain.com 8125
Back on the StatsD host, you can see the metric come through:
e......."A.test.goulah.myservice:1|c
There is also the line of communication from StatsD to the Graphite host. Every 10 seconds it flushes its metrics. Start up another tcpdump command, this time on port 2003, which is the port carbon is listening on the Graphite side:
% sudo tcpdump -t -A -s0 dst port 2003 | grep goulah listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
Every 10 seconds you should see a bunch of stats go by. This is what you are flushing into the Graphite backend. In our case I’m doing a grep for goulah, and showing the data aggregated for the metric we sent earlier. Notice there are two metrics here that look slightly different than the metric we sent though. StatsD sends two lines for every metric. The first is the aggregated metric prefixed with the stats namespace. StatsD also sends the raw data prefixed by stats_counts. This is the difference in the value per second calculated and the raw value. In our case they identical:
stats.test.goulah.myservice 0 1351355521 stats_counts.test.goulah.myservice 0 1351355521
Now we can get a better understanding of what StatsD is doing under the covers on our system. If metrics don’t show up on the Graphite side it helps to break things into digestible pieces to understand where the problem lies. If the metrics aren’t even getting to StatsD, then of course they can’t make it to Graphite. Or perhaps they are getting to StatsD but you are not seeing the metrics you would expect when you look at the graphs. This is a good start on digging into those types of problems.
Oct 06 2012
It comes up often how to connect to our office openvpn network using an iPad or iPhone. On OSX its pretty simple, use Viscosity or Tunnelblick. But to my knowledge there is nothing like that for iDevices. However its possible to connect these using a SOCKS proxy. The SOCKS server lives on your laptop connected to the VPN, and the iPhone/iPad will be setup to connect through that. Obviously you should only do this on a secured wireless network and/or secure the SOCKS server so that only you have access. I wrote these notes a couple years ago and figured its worth sharing since it comes up once in a while.
Setting up the server is really easy, we can use ssh – just run this command on your laptop that is connected to your VPN
ssh -N -D 0.0.0.0:1080 localhost
If you want it to run in the background also use the -f option. You may also want to setup some access control with iptables, which is a bit out of scope of this article but more information can be found here.
The only way to configure the iPhone/iPad to use SOCKS is to setup a PAC file. Create a file with the .pac extension, and put this into it:
function FindProxyForURL(url, host)
{
return "SOCKS 192.168.X.XXX";
}
Make sure to use the IP address of your laptop that we setup the SOCKS server on. Now put this file in any web accessible location. It doesn’t matter if its internal to your network or external, as long as you can access it from the web. How to actually serve a page is beyond the scope of this article, but if you’ve gotten this far you probably know how to do this.
Now you just have to tell the iPad to use the PAC file so that it will proxy web requests through the laptops VPN.
Click: Settings -> WiFi
Then click the blue arrow to the right of your access point and under HTTP Proxy choose Auto. In the URL field, put the full URL to the PAC file that we setup. Make sure to put the http:// protocol in this URL line. For example this may look something like: http://yourserver.com/myproxy.pac
Sometimes getting this setting to stick is tricky. I recommend clicking out of the text field into another field and letting the iPhone spinner in the upper left finish.
If you did everything right you should be able to hit websites behind your VPN connection. One way to debug that its working is to startup ssh with the -vvv option. When you request pages through the proxy you will see a bunch of output. If there is no output you’re not using the proxy.
Jul 20 2012
I was at OSCON this week, and my friend Erik Kastner and I did a talk about development environments. Specifically what to avoid and how to keep environments consistent across development and production. As usual the slides are not fully explanatory without seeing the accompanying talk but here they are anyway:
May 28 2012
If you are developing chef recipes it really helps to use the command line tool called shef. Shef is just a REPL to run chef in an interactive ruby session. If you haven’t ever tried it, you can find some nice instructions over here to get you going.
Shef gives an easy way to iterate on your recipes so that you can make small changes and see the effects. However, I found the include_recipe function would only load the recipe one time, and complain that its seen the recipe before on subsequent tries. I added a small patch that implemented a new function called load_recipe that will allow you to reimport the recipe. The problem is that once the recipe is loaded again, the resource list is reimported giving us the same set of resources twice.
You can see the list of resources that are loaded up like so
chef:recipe > puts run_context.resource_collection.all_resources package[php] package[php-common]
If you were to call load_recipe again the list would double, and the new code would be run second when calling run_chef
chef:recipe > puts run_context.resource_collection.all_resources package[php] package[php-common] package[php] package[php-common]
The trick is that you can clear this list with this command
run_context.resource_collection = Chef::ResourceCollection.new
So to use load_recipe you should call the above before it to clear the current list. This can be done in one line like so
run_context.resource_collection = Chef::ResourceCollection.new; load_recipe "php"
Hopefully I’ll be able to patch things to add a reload_recipe that overwrites the old resources so you don’t have to use this trick, but for now this will work to get quick iterations going.
Apr 15 2012
I attended Percona Conf in Santa Clara last week. It was a great 3 days of lots of people dropping expert MySQL knowledge. I learned a lot and met some great people.
I gave a talk about the Etsy shard architecture, and had a lot of good feedback and questions. A lot of people do active/passive, but we run everything in active/active master-master. This helps us keep a warm buffer pool on both sides, and if one side goes out we only lose about half of the queries until its pulled and they start hashing to the other side. Some details on how this works in the slides below.