Mar 21 2016

Configuring Chef for Provisioning

Category: Cloud Computing,Configuration Managementjgoulah @ 10:23 PM

If you’re working with infrastructure its good practice to describe it using code so that it is reproducible and consistent across servers and development environments. I’ve used Chef for quite some time and feel it is a pretty natural way to represent the source of truth for your servers, the packages installed on them, and their configuration. Chef can also be used as a provisioning tool, to bring your servers to life configured exactly to your specifications. You can use it with services like AWS or tools like Docker.

I started out using chef local mode to test my provisioning recipes, but also wanted to get things working with chef-client running as a daemon. But because of ACL’s in place when Chef is run this way, you need to grant permissions to the right groups to make sure they can do things such as create nodes.

This is hinted at with an error that looks like:

This error: Net::HTTPServerException: 403 "Forbidden"

or if you go digging with debug mode on (chef-client -l debug), you might see something analogous to this buried in a ton of output:

[2016-03-21T10:13:05-04:00] DEBUG: ---- HTTP Response Body ----
[2016-03-21T10:13:05-04:00] DEBUG: {"error":["missing update permission"]}

The default Chef ACL’s don’t allow nodes’ API clients to modify other nodes, and so we have to create a group with such permissions that your provisioning node (the one that kicks off the new instance/machine to be provisioned) can create the machines’ nodes and clients. This is similarly explained in this slightly outdated post here but unfortunately the commands aren’t quite right, so here it is using the most current version of the tooling.

Setting up Permissions

First things first install the ACL gem (assuming you’re using chef development kit)

chef gem install knife-acl

We can then create a group to give access to the permissions we need:

knife group create provisioners

Now, if you’re setting up a new node to be your provisioner, you would create the client key and node object:

knife client create -d chefconf-provisioner > ~/.chef/chefconf-provisioner.pem
knife node create -d chefconf-provisioner

Or you may already have a client that you run chef-client from. Lets say that is called chefconf-provisioner as it is the client we created above, so we’ll go with that, but your client can be named anything. Note, its usually the hostname of the node you’re running from. Add your client to the group we just created like so:

knife group add client chefconf-provisioner provisioners

Chef server uses role-based access control (RBAC) to restrict access to objects—nodes, environments, roles, data bags, cookbooks, and so on. This ensures that only authorized user and/or chef-client requests to the Chef server are allowed.

In this case we need to grant read/create/update/grant/delete permissions for clients and nodes so that our provisioning node can create the new instance/machine:

for permission in read create update grant delete
do
  knife acl add group provisioners containers clients $permission 
done
for permission in read create update grant delete
do
  knife acl add group provisioners containers nodes $permission 
done

And now you should have the permissions to be able to provision new nodes using Chef!

Tags: , , , ,


Mar 07 2016

Running Strace in Docker

Category: Cloud Computing,Containers,Kerneljgoulah @ 10:40 PM

I’ve been reverse engineering a new application setup and seemed like an appropriate place to try out docker. Spinning up a lightweight and reproducible environment is the goal and containerization is a reasonably efficient way to accomplish that. As I was looking into a problem with getting some services running properly, with little debug output and sparse documentation, I reached for the old trusty strace to see what was going on. But what do you know, strace is disabled by default on Docker. Here is the error that I got:

strace: test_ptrace_setoptions_for_all: PTRACE_TRACEME doesn't work: Operation not permitted
strace: test_ptrace_setoptions_for_all: unexpected exit status 1

This is admittedly an error I hadn’t seen before, and google isn’t a ton of help on this one. As a newbie with docker, it would have been helpful to have a bit more detailed error message as to why such a common tool as strace isn’t working.

Luckily some IRC logs come to the rescue in my quest through WTFed’ness. I learned that the security around this feature has apparently evolved a bit over time, with apparmor being the older but still used security system, and secconf being only available on newer distros (and OSX running boot2docker). Confusing things further, some of the articles out there are referring to apparmor (which uses different methods for modifying its security policy).

If you are using secconf, there are a couple of things you can pass to docker run to loosen up this security policy. To allow strace specifically, you enable the system call that it relies upon to get its information (ptrace):

--cap-add SYS_PTRACE

This whole paradigm is in fact documented but none of my original searches turned up these pages. In addition to disabling ptrace, there are a slew of other system level commands that you may (or may not) need that aren’t on the docker whitelist of allowed system calls. The list of calls can be adjusted very granularly by feeding docker a json file defining your security options. Or if you are feeling up for it, you can re-enable all of them in one fell swoop with this option to docker run:

--security-opt seccomp:unconfined

Its definitely worth considering which system calls your container really needs access to though, and strace is one of those that is quite useful for debugging purposes. There will always be that balance between security and usability, and decisions to make on which direction to swing the pendulum. It’s nice to see that this kind of functionality is being supported by docker to give the container very granular access to system level calls, and it might be interesting to think about ways it could be highlighted to a surprised enduser.

Tags: , , , ,


Jul 06 2014

Dynamically Update Hypervisor Guest Info in Chef With Rehai

Category: Cloud Computing,Configuration Managementjgoulah @ 11:31 AM

Introduction

It turns out this little tool is long overdue, as simple of a concept as it is, but also easy to misunderstand the use cases for, ours at Etsy however was very targeted. Several years ago we were hammering out our internal cloud infrastructure, using KVM/QEMU based solution that you can read about over here. We were populating our virtual machine frontend using Chef Ohai data as the canonical source of our system information. There is an ohai plugin that gathers KVM virtualization data using virsh commands which are part of libvirt. It was a perfect way to capture information about which guests existed on a given system and other information about them.

Our problem

We were hitting a bottleneck in that our chef clients were setup to run about every ten minutes. But within that ten minutes it would be possible that a virtual machine would be added or deleted, and therefore it was difficult to keep our interface in sync. Imagine creating a new virtual machine but not being able to display data about it until you waited around ten minutes, and to make matters worse these clients run at a splay interval, which means they don’t run all at the same time. Therefore, we started on a simple script that would let us run ohai quickly without needing to do the full chef run. While our chef runs are relatively quick (usually < 1 minute) it would introduce problems if we try to run chef while the client is already running.

Going to open source

It was supposed to be released a while ago, but has taken some time for various reasons. It’s a shockingly tiny amount of code but there were some barriers to releasing it. The majority of the code was written by Mark Roddy but he’d turned it over to me to open source. I went through the normal chef contribution process, which at the time required opening a Jira ticket that you can read if you’re interested in some of the details. In short there were some questions about the use case but when I explained that we weren’t trying to re-invent graphite in some horrible way, we were able to agree there could be some real world use cases. That being said, it was not accepted into the core yet because this does introduce very small race conditions since chef uses a read-modify-save model of changing the attribute data. There is a proposal to fix this, which divides attributes into different levels in which automated updates can access them without causing this issue. However in the wild this has not actually posed an issue for us even with several hundred nodes running it.

Installation

If you’re interested in this tool, you can install it with ruby gems using the command:

% gem install rehai

If you’d like to see the source, head on over to github.

Tags: , , , , ,


Jan 21 2013

Building Your Own Cloud From Scratch

Category: Cloud Computing,Systemsjgoulah @ 8:50 PM

Intro

There are a lot of private cloud solutions out there with great things built into them already to complete a full cloud stack – networking, dashboards, storage, and a framework that puts all the pieces together, amongst other things. But there is also a decent amount of overhead to getting these frameworks setup, and maybe you want more flexibility over some of the components, or even just something a little more homegrown. What might a lightweight cloud machine bootstrapping process look like if it where implemented from scratch?

Getting Started

We can use libvirt and KVM/QEMU to put something reasonably robust together, start by installing those packages:

apt-get install qemu-kvm libvirt libvirt-bin virtinst virt-viewer

The next important thing is to setup a bridge for proper networking on this host. This will allow the guests to use the bridge to communicate on the same network. There should be a few articles out there that can help you set this up, but the basics are that you want your bridge assigned the IP that your eth0 interface previously had, and then add the eth0 interface to the bridge. In this example 192.168.1.101 is the IP of the host machine:

# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eth0 inet manual

auto br0
iface br0 inet static
  address 192.168.1.101
  netmask 255.255.255.0
  gateway 192.168.1.1
  network 192.168.1.0
  bridge_ports eth0
  
ifup br0

Building the Image

The first step is setting up a base template that you create your instances from. So grab an iso to start from, we’ll use debian, but this process works with any distro:

% wget http://cdimage.debian.org/debian-cd/6.0.6/amd64/iso-cd/debian-6.0.6-amd64-netinst.iso

And allocate a file on disk to the size you’d like your template to be. I created one here at 8GB, it can always be expanded later, so this should only need to be big enough to hold the initial base image that all instances will start from. Generally smaller is better because of the copy step when instances get created later.

% dd if=/dev/zero of=/var/lib/libvirt/images/debbase.img bs=1M count=8192

Now you can start the linux installation, noting the –graphics args for the ability to connect with VNC. Our installation target disk is the one we created above, debbase.img, and we are giving it 512M RAM and 1 CPU.

% virt-install --name=virt-base-deb --ram=512 --graphics vnc,listen=0.0.0.0  --network=bridge=br0 \
--accelerate --virt-type=kvm --vcpus=1 --cpuset=auto --cpu=host --disk /var/lib/libvirt/images/debbase.img \
--cdrom debian-6.0.6-amd64-netinst.iso

Once thats started up you can use VNC on your client machine to connect to this instance graphically and run through the normal install setup. There are plenty of clients out there but a decent one is Chicken of the VNC. Its also possible at this step that you’d create the image off a PXE boot or similar bootstrapping mechanism.

Extract the Partition

Here we take advantage of QEMU ability to load Linux kernels and init ramdisks directly, thereby circumventing bootloaders such as GRUB. It then can be launched with the physical partition containing the root filesystem as the virtual disk.

There are two steps to make this work. First you’ll need the vmlinuz and initrd files, and the easiest way to get those is to copy them from the base image we setup above:

% scp BASEIP:/boot/vmlinuz-2.6.32-5-amd64 /var/lib/libvirt/kernels/
% scp BASEIP:/boot/initrd.img-2.6.32-5-amd64 /var/lib/libvirt/kernels/

The next step is to extract the root partition from that same base image. We want to take a look at how those partitions are laid out so that we can get the right numbers to pass to the dd command.

% sfdisk -l -uS /var/lib/libvirt/images/debbase.img

Disk /var/lib/libvirt/images/debbase.img: 1044 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/var/lib/libvirt/images/debbase.img1   *      2048  15988735   15986688  83  Linux
/var/lib/libvirt/images/debbase.img2      15990782  16775167     784386   5  Extended
/var/lib/libvirt/images/debbase.img3             0         -          0   0  Empty
/var/lib/libvirt/images/debbase.img4             0         -          0   0  Empty
/var/lib/libvirt/images/debbase.img5      15990784  16775167     784384  82  Linux swap / Solaris

We are going to pull the first partition out, note how the numbers line up to the first line corresponding to debbase.img1 line. We start at sector 2048 and get 15986688 sectors of 512 bytes each:

% dd if=/var/lib/libvirt/images/debbase.img of=/var/lib/libvirt/debian-tmpl skip=2048 count=15986688 bs=512

Templatize the Image

Now we have a disk file that serves as our image template. There’s a few things we want to change directly on this template. Note that we are using a few all caps placeholders ending in -TMPL that we’ll replace later with sed. We can edit the templates files by mounting the disk:

% mkdir -p /tmp/newtmpl
% mount -t ext3 -o loop /var/lib/libvirt/debian-tmpl /tmp/newtmpl
% chroot /tmp/newtmpl

Note at this point we are chrooted and these commands are acting against our template disk file.

Clear out the old IPs tied to our NIC when the base image networking was setup:

% echo "" > /etc/udev/rules.d/70-persistent-net.rules

We’re going to put a placeholder for our hostname in /etc/hostname:

% echo "HOSTNAME-TMPL" > /etc/hostname

Set a nameserver template in /etc/resolv.conf:

% echo "nameserver NAMESERVER-TMPL" > /etc/resolv.conf 

In the file /etc/network/interfaces:


# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address ADDRESS-TMPL
netmask NETMASK-TMPL
gateway GATEWAY-TMPL

This will give us console access when we boot it. Make sure /etc/inittab has this line (usually just uncomment it):

T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100

Creating an Instance

Now we have all the pieces together to launch an instance from our image. This script will create the instance given the IP and hostname. It does no error checking for readability reasons, and is well commented so that you know whats going on:

#!/bin/bash

# read in ' ' from command line
virt_ip=$1
virt_host=$2

# build the fqdn based off the short host name
virt_fqdn=${virt_host}.linux.bogus

# fill in your network defaults
virt_gateway=192.168.1.1
virt_netmask=255.255.225.0
virt_nameserver=192.168.1.101

# how the disk/ram/cpu is sized
virt_disk=10G
virt_ram=512
virt_cpus=1

# random mac address
virt_mac=$(openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/.$//')

cp /var/lib/libvirt/images/debian-tmpl /var/lib/libvirt/images/${virt_host}-disk0

# optionally resize the disk
qemu-img resize /var/lib/libvirt/images/${virt_host}-disk0 ${virt_disk}
loopback=`losetup -f --show /var/lib/libvirt/images/${virt_host}-disk0`
fsck.ext3 -fy $loopback
resize2fs $loopback ${virt_disk}
losetup -d $loopback

mountbase=/tmp/${virt_host}
mkdir -p ${mountbase}
mount -o loop /var/lib/libvirt/images/${virt_host}-disk0 ${mountbase}

# replace our template vars
sed -i -e "s/ADDRESS-TMPL/$virt_ip/g" \
       -e "s/NETMASK-TMPL/$virt_netmask/g" \
       -e "s/GATEWAY-TMPL/$virt_gateway/g" \
       -e "s/HOSTNAME-TMPL/$virt_fqdn/g" \
       -e "s/NAMESERVER-TMPL/$virt_nameserver/g" \
  ${mountbase}/etc/network/interfaces \
  ${mountbase}/etc/resolv.conf \
  ${mountbase}/etc/hostname

# unmount and remove the tmp files
umount /tmp/${virt_host}
rm -rf /tmp/${virt_host}*

# run a file system check on the disk
fsck.ext3 -pv /var/lib/libvirt/images/${virt_host}-disk0

# specify the kernel and initrd (these we copied with scp earlier)
vmlinuz=/var/lib/libvirt/kernels/vmlinuz-2.6.32-5-amd64
initrd=/var/lib/libvirt/kernels/initrd.img-2.6.32-5-amd64

# install the new domain with our specified parameters for cpu/disk/memory/network
virt-install --name=$virt_host --ram=$virt_ram \
--disk=path=/var/lib/libvirt/images/${virt_host}-disk0,bus=virtio,cache=none \
--network=bridge=br0 --import --accelerate --vcpus=$virt_cpus --cpuset=auto --mac=${virt_mac} --noreboot --graphics=vnc \
--cpu=host --boot=kernel=$vmlinuz,initrd=$initrd,kernel_args="root=/dev/vda console=ttyS0 _device=eth0 \
_ip=${virt_ip} _hostname=${virt_fqdn} _gateway=${virt_gateway} _dns1=${virt_nameserver} _netmask=${virt_netmask}"

# start it up
virsh start $virt_host

assuming we named it buildserver, run the above like:

% buildserver 192.168.1.197 jgoulah

Conclusion

This is really just the first step, but now that you can bring a templated disk up you can decide a little more about how you’d like networking to work for your cloud. You can either continue to use static IP assignment as shown here, and use nsupdate to insert dns entries when new guests come up, or you can set things up such that the base image uses dhcp, and you can configure your dhcp server to update records in dns when clients come online. You may also want to bake your favorite config management system into the template so that you can bootstrap the nodes and maintain configurations on them. Have fun!

Tags: , , , , ,


Dec 13 2009

Using Amazon Relational Database Service

Category: Cloud Computingjgoulah @ 2:09 PM

Intro

Amazon recently released a new service that makes it easier set up, operate, and scale a relational database in the cloud called Amazon Relational Database Service. The service, based on MySQL for now, has its pluses and minuses and you should decide whether it fits your needs. The advantages are that it has an automated backup system that lets you restore to any point within the last 5 minutes and also allows you to easily take a snapshot at any time. It is also very easy to “scale up” your box. This is more in the realm of vertical scaling but if you find you are hitting limits you can upgrade to a more powerful server with little to no effort. It also gives you monitoring via Amazon Cloudwatch and automatically patches your database during your self defined maintenance windows. The downfalls are that you don’t have access directly to the box itself, so you can’t ssh in. You also at this point cannot use replication for a master-slave style setup. Amazon promises to have more high availability options forthcoming. Since you can’t ssh in, you adjust mysql parameters via their db parameter group API. I’ll go over an example of this.

Creating a Database Instance

The first thing to do is install the RDS command line API, which you can grab from here. I’m not going over the details of setting this up. Its basically as simple as putting it into your path and Amazon has plenty of documentation on this.

Once you have the command line tools setup you can create a database like so

rds-create-db-instance \
        mydb-instance \
        --allocated-storage 20 \
        --db-instance-class db.m1.small \
        --engine MySQL5.1  \
        --master-username masteruser \
        --master-user-password mypass \
        --db-name mydb --headers \
        --preferred-maintenance-window 'Mon:06:15-Mon:10:15' \ 
        --preferred-backup-window  '10:15-12:15' \
        --backup-retention-period 1 \
        --availability-zone us-east-1c 

This should be fairly self explanatory. I’m creating an instance called mydb-instance. The master (basically root) user is called masteruser with password mypass. It also creates an initial database called mydb. You can add more databases and permissions later. This also sets up the maintenance and backup windows which are required, and defined in UTC. The backup retention period is how long it holds on to my backups, which I’ve defined as 1 day. If you set this to 0 it will disable the automated backups entirely which is not advised.

The next thing to do is setup your security groups so that your EC2 (or your hosted servers) have access to your database. There is good documentation on this so I will go over a basic use case.

rds-authorize-db-security-group-ingress \
        default \
        --ec2-security-group-name webnode \
        --ec2-security-group-owner-id XXXXXXXXXXXX 

In the case above I’m creating a security group called default, that allows my ec2 security group webnode access. The group-owner-id parameter is your AWS account id.

You can find what your database DNS name is via the rds-describe-db-instances command.

rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  default.mysql5.1  in-sync

So we can see our hostname is mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com

Now you can login to your instance in the usual way that you access mysql on the command line, setup your users and import your database in the usual way.

mysql -u masteruser -h mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com -pmypass

Using Parameter Groups to View the MySQL Slow Queries Log

As I mentioned earlier you don’t have access to ssh into the instance, so you need to use db parameter groups to tweak your configuration rather than editing the my.cnf file. You can’t see the mysql slow query log on the box but there is still a way to access it and I’ll go over that process.

Amazon won’t let you edit the default group, so the first thing to do is create a parameter group to define your custom parameters.

rds-create-db-parameter-group my-custom --description='My Custom DB Param Group' --engine=MySQL5.1

Then set the parameter to turn the query log on

rds-modify-db-parameter-group my-custom  --parameters="name=slow_query_log, value=ON, method=immediate"

We’re still using the default configuration so you have to tell the instance to use your custom parameter group

rds-modify-db-instance mydb-instance --db-parameter-group-name=my-custom

The first time you apply a new custom group you have to reboot the instance, as pending-reboot here indicates

$ rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  pending-reboot

So we can reboot it immediately like so

$ rds-reboot-db-instance mydb-instance

When it comes back up it will show that its in-sync

$ rds-describe-db-instances 
DBINSTANCE  mydb-instance  2009-11-06T02:19:59.160Z  db.m1.small  mysql5.1  20  masteruser  available  mydb-instance.cvjb75qirgzk.us-east-1.rds.amazonaws.com  3306  us-east-1d  1
      SECGROUP  default  active
      PARAMGRP  my-custom  in-sync

We can login to the instance and see that our parameter was set correctly

mysql> show global variables like 'log_slow_queries';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| log_slow_queries | ON    | 
+------------------+-------+
1 row in set (0.00 sec)

Since you don’t have access to the filesystem, its logged to a table on the mysql database

mysql> use mysql;

mysql> describe slow_log;
+----------------+--------------+------+-----+-------------------+-----------------------------+
| Field          | Type         | Null | Key | Default           | Extra                       |
+----------------+--------------+------+-----+-------------------+-----------------------------+
| start_time     | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | 
| user_host      | mediumtext   | NO   |     | NULL              |                             | 
| query_time     | time         | NO   |     | NULL              |                             | 
| lock_time      | time         | NO   |     | NULL              |                             | 
| rows_sent      | int(11)      | NO   |     | NULL              |                             | 
| rows_examined  | int(11)      | NO   |     | NULL              |                             | 
| db             | varchar(512) | NO   |     | NULL              |                             | 
| last_insert_id | int(11)      | NO   |     | NULL              |                             | 
| insert_id      | int(11)      | NO   |     | NULL              |                             | 
| server_id      | int(11)      | NO   |     | NULL              |                             | 
| sql_text       | mediumtext   | NO   |     | NULL              |                             | 
+----------------+--------------+------+-----+-------------------+-----------------------------+
11 rows in set (0.11 sec)

We may also want to set things like the slow query time, since the default of 10 is pretty high

$ rds-modify-db-parameter-group my-custom  --parameters="name=long_query_time, value=3, method=immediate"

The rds-describe-events command keeps a log of what you’ve been doing

$ rds-describe-events 
db-instance         2009-12-12T17:44:19.546Z  mydb-instance  Updated to use a DBParameterGroup my-custom
db-instance         2009-12-12T17:45:51.636Z  mydb-instance  Database instance shutdown
db-instance         2009-12-12T17:46:09.380Z  mydb-instance  Database instance restarted
db-parameter-group  2009-12-12T17:56:02.568Z  my-custom        Updated parameter long_query_time to 3 with apply method immediate

And again you can check mysql that your parameter was edited properly. Note how this time we didn’t have to reboot anything as our parameter group is already active on this instance

mysql> show global variables like 'long_query_time';
+-----------------+----------+
| Variable_name   | Value    |
+-----------------+----------+
| long_query_time | 3.000000 | 
+-----------------+----------+
1 row in set (0.00 sec)

Summary

In this article we went over some basics of Amazon RDS and why you may or may not want to use it. If you are just starting out its a really easy way to get a working mysql setup going. However if you are porting from an architecture with multiple slaves or other HA options this may not be for you just yet. We also went over some basic use cases on how to tweak parameters since there is no command line access to the box itself. There is command line access to mysql though, so you can use your favorite tools there.

References

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2935&categoryID=295

Tags: , , , ,