I’ll be totally honest in saying even though the Dockerfile is the supported way to build docker images, I have not quite bought into building my containers using the format. I believe the main thing we are losing with them is a platform agnostic tool to build machine images, which is useful if you don’t always want or need to use the container format but want to keep the same configuration across different types of hosts or environments. Defining infrastructure in code so that we can use logic, templates, and reusable patterns gives us so many more opportunities towards writing intelligent provisioning processes that span across development and production. Given that, I’ve turned to Chef as a tool which can represent my configuration at both the application and machine level.
My original use case was to spin up a Docker container to host our development environment, and then bring this same configuration into production. I’ve recently inherited an environment that didn’t use the concept of containers or really have any reproducible method to spin up new hosts. And even though this is basically what docker does best, we weren’t quite ready to mix containers into production when we were still trying to solidify the ground that it stands on and make small, careful improvements. So the idea was to use chef to provision docker containers for development of the application but also for testing chef itself, and then use those same recipes to build EC2 instances to stand up alongside the current system. The great thing about this is that we are able to run any chef recipe on a container, and test changes before moving them into production.
This chef container provisioning recipe is pretty straightforward and can be run with chef solo (for example, to easily provision a development instance on a laptop):
This gives us a container with the home directory mounted as a volume with correct user permissions, running a role we have defined as Dev containing several recipes. These same exact recipes might be put in a Prod role which is used in production. It doesn’t really matter if production is a container, EC2 instance, or something else like a bare metal server. Using this pattern can lay the groundwork to creating platform agnostic images for our infrastructure that are identical in configuration.
It turns out this little tool is long overdue, as simple of a concept as it is, but also easy to misunderstand the use cases for, ours at Etsy however was very targeted. Several years ago we were hammering out our internal cloud infrastructure, using KVM/QEMU based solution that you can read about over here. We were populating our virtual machine frontend using Chef Ohai data as the canonical source of our system information. There is an ohai plugin that gathers KVM virtualization data using virsh commands which are part of libvirt. It was a perfect way to capture information about which guests existed on a given system and other information about them.
We were hitting a bottleneck in that our chef clients were setup to run about every ten minutes. But within that ten minutes it would be possible that a virtual machine would be added or deleted, and therefore it was difficult to keep our interface in sync. Imagine creating a new virtual machine but not being able to display data about it until you waited around ten minutes, and to make matters worse these clients run at a splay interval, which means they don’t run all at the same time. Therefore, we started on a simple script that would let us run ohai quickly without needing to do the full chef run. While our chef runs are relatively quick (usually < 1 minute) it would introduce problems if we try to run chef while the client is already running.
Going to open source
It was supposed to be released a while ago, but has taken some time for various reasons. It’s a shockingly tiny amount of code but there were some barriers to releasing it. The majority of the code was written by Mark Roddy but he’d turned it over to me to open source. I went through the normal chef contribution process, which at the time required opening a Jira ticket that you can read if you’re interested in some of the details. In short there were some questions about the use case but when I explained that we weren’t trying to re-invent graphite in some horrible way, we were able to agree there could be some real world use cases. That being said, it was not accepted into the core yet because this does introduce very small race conditions since chef uses a read-modify-save model of changing the attribute data. There is a proposal to fix this, which divides attributes into different levels in which automated updates can access them without causing this issue. However in the wild this has not actually posed an issue for us even with several hundred nodes running it.
If you’re interested in this tool, you can install it with ruby gems using the command:
% gem install rehai
If you’d like to see the source, head on over to github.