Our Path to Kubernetes

We recently pulled the switch on migrating 100% of our traffic to Kubernetes. This was a long process for us for a number of reasons, including supreme fault intolerance. Some of our customers have to make all of their revenue for the year in quite a small number of hours, and their load profile makes configuration of load balancers and failovers more than a little problematic. We had several false starts in this conversion, as a result. Today, I am going to discuss how we got from Here to There.

Starting Points

When I joined Bypass just 15 months ago, we served a fraction of the traffic we do today, had about a third of the server inventory we have today, and had none of the sophistication we have today.

The typical application looked like this:

  • a pet EC2 instance, hand crafted from artisanal bits
  • a Ruby on Rails project
  • a bunch of completely custom-for-that-application Capistrano recipes
  • Samson to glue it all together

There were some bits of Capistrano that could be reused between projects. We did not do this consistently, though, so there were maybe three ways to migrate schemas across a dozen applications. Some apps had code to ensure zero-downtime deployments, while others did not.

Worst of all, horizontal scale required one to manually provision and configure a new, artisanal EC2 instance, the shepard a PR through git to add it to the Capistrano inventory, and then deploy once done.

We had two nascent services running on AWS EC2 Elastic Container Service, but our developers found it very challenging to troubleshoot these services due to ECS’s arcane (some might say profane) user experience.

Our Rails Monolith

The biggest challenge to date has been migrating our Ruby on Rails monolith. This application is responsible for a very, very large fraction of our business logic and is integral to our business operations. I hesitate to say that it is “in” our critical path, as in many ways it is the critical path.

If you have ever run a Rails application in production (or, truthfully, any app of the rails/django webdev generation), it follows a pattern you will recognize: a big monolithic codebase with an incredibly messy many-to-many object model that will survive as technical debt for another decade. While we are working to “break up the monolith” into something approaching microservices, this monolith is our reality today.

We have traditionally run it using Phusion Passenger, backed with MySQL, and running async jobs using Sidekiq and redis. It’s a stack you have probably seen a million times by now.

The Goal State

As of this weekend, something very close to 100% of our traffic is served by projects running in Kubernetes. We have a few low-traffic endpoints on the long trail to officially retire our pet server fleet, and hope to complete that process this quarter. The most important major straggler, which we transferred over this weekend, was our main Ruby monolith.

We aren’t doing anything particularly interesting in Kubernetes. No custom schedulers, for instance. We use the base APIs, and we use them effectively. In fact, our production stack is (currently) still running Kubernetes 1.4 at the moment.

We utilize the deployment API to define each service and, because we’re in AWS, we use LoadBalancer services to hook up to the outside world. The HorizontalPodAutoscaler API allows us to react to traffic spikes relatively instantly. This turned out to be a big deal, and a huge win, because our 100% Rollout Weekend was not as quiet as it should be.

Pre-Gaming It

Our initial Kubernetes-based services were a small handful of non-critical services. We started with a timeclock microservice used by a subset of our customers, an integration to a specific, major partner, and a couple of internal tools.

As we were wrapping up evaluating Kubernetes in production with these services, we had an outage to an unrelated system. The practical “upshot”, as it were, was that our background Sidekiq queues got about 2 orders of magnitude over “completely out of control.”

Kif Kroker: Captain, may I have a word with you?

Captain Zapp Brannigan: No.

Kif Kroker: It’s an emergency, sir.

Captain Zapp Brannigan: Come back when it’s a catastrophe.

[a huge rumbling is heard]

Captain Zapp Brannigan: Oh, very well.

As it happened, we had already begun testing the monolith in our integration and dev Kubernetes environments. We had it Dockerized already, though quite inelegantly. As the outage progressed, I started putting together the checklist I needed to run through to add inventory to the fleet to bring on more Sidekiq instances. If you look back up there, earlier in this very post, you’ll see why I was unenthused by this prospect.

But I had a container image, and I had a podspec. I could literally launch as many Sidekiq workers as needed at an instant’s notice. So I made the executive decision, cross-checked all the relevant environment variables, and spun up the deployment.

This was our first big – and completely unplanned – win with Kubernetes. Before this, it was still seen as a bit of a tech experiment. Afterwards, we haven’t just drank the Kool-aid, we’re practically swimming in it.

Making the Move

It took us about 6 months to fully make the move to Kubernetes. We had a lot of trouble predicting load and usage levels, tweaking memory and cpu limits on our deployments, generating test loads to verify, and even validating results from those tests we could do.

Nighthawks

Our first attempts to get a handle on load profiles was to have a cross-functional team work night shifts for about 3 weeks and load test the production infrastructure during off-peak hours. This coincided with our need to perform a handful of breaking infrastructure changes unrelated to the Kubernetes initiative.

AWS Route53 was a real god-send here. We generated three Route53 weighted-DNS configurations:

  • A Alias record pointing only to pet servers
  • A Alias records pointing to a mix of pets and k8s
  • A Alias record pointing only to k8s

In the end, however, this specific work only helped generate the knowledge base we needed in the long run and the results were not terribly useful. We were unable to generate a truly organic load test from which we could make infrastructure decisions.

Testing in Production

mimitw

Weighted DNS really made a difference here as well. In fact, this whole blog post could be summed up as “Route53 Is Amazing.” We had a couple of interesting pieces of information to work with.

  • We had 6 pet monolith servers
  • We had 2 hostnames
    • api, used by integrations and other services
    • ingest, used by critical flow traffic
  • Both hostnames pointed to the same 6 servers

This gave me a self-selecting feature break to phase in kubernetes.

api switchover

We started with the api hostname, as it would be more tolerant to failure. We started with weighted DNS directing 5% of traffic to k8s, 95% to the pets.[fn1]

After observing this configuration for a few weeks and performing additional tweaks, we elevated the mix to 75/25 and got stuck here for several months. Many of our problems here were mere instrumentation rather than issues with our software or with Kubernetes.

We solved this problem by changing the NEW_RELIC_APP_NAME variable for this deployment, allowing us to more readily separate out performance profiles between the pets and the k8s deployments. In fact, we wound up with the same software reporting as several different applications in New Relic. Once we did this, we were able to plainly see that performance in k8s was actually much better than we’d thought, and pulled the lever to move to 100% k8s.

ingest switchover

Moving ingest endpoints over was a much more arduous affair. This is our most critical traffic, the kind of business processes where our customers start blowing up our phones when things go wrong. As a result, we were much more conservative in migrating this traffic. The work here ran in parallel to the work on api, just at a slower pace.

Then something unexpected happened.

We had been talking about retiring the ingest hostname for quite some time and I lost track of the status of that initiative. Shortly after we reached a 50/50 mix on ingest, our tablet engineers shipped a version that implemented this change, and they did so in the run-up to a product launch expected to deliver significantly more traffic than we were accustomed to receiving.

In fact, in expectation of this launch, I had placed my entire team on-call for the day. As a result, we observed this delta between expectation and reality almost instantly. The k8s horizontal pod autoscaler kicked in and drove our api deployment up to its upper bound almost instantly.

We did some quick back-of-the-napkin math and wondered if we should drop api back to a mix with the pets, but ultimately decided to stay the course. Once the launch was over, we evaluated all available telemetry we had on the event and realized that we were safe to flip ingest over completely.

We entered the weekend with the original pet servers held in reserve, with Route53 healthchecks and failover configured to redirect traffic if anything hiccups in k8s. Being in Sports and Entertainment, our highest traffic-hours are typically mid-evening on a Saturday night, and we’ve had a string of record breaking Saturdays recently. Data Science reported that this weekend was projected to be a little lighter than usual, so we elected to stay the course yet again.

The 7 o’clock hour on Saturday, April 8, 2017, was not a little lighter than usual. It was our highest traffic-hour ever.

I posted this to Twitter the following morning before I got the full traffic results:

That is, indeed, a very pretty viz. The HPA once again kicked in and took care of business. Each block you see here represents a pod started to respond to traffic.


fn1: Please note that this doesn’t actually load balance the incoming traffic, but rather the DNS lookups. Because we have enterprise customers, there was some risk here that one very large customer would get k8s in the lookup and the traffic pattern would have been skewed beyond expectations. To whatever degree this happened, it did not affect our operations.

Using Vault to Secure Etcd, Kubernetes

The amazing folks over at DigitalOcean recently posted a blog article detailing their usage of Vault as a Certificate Authority for Kubernetes. I had previously been struggling for how and why to use Vault in my infrastructure, in part because I hadn’t taken the time to get my head around Vault’s metaphors. This article pushed me over the edge and imparted comprehension upon me.

The only problem is that it’s woefully short on actual implementation details.

Infrastructure Allergy

I have a well-established allergy to infrastructure. I take the view that any added infrastructure must, in all circumstances, not introduce any more administration or training than the original problem presented. This is very often an insurmountable obstacle to adoption, and this is by design. If you’re implementing highly available, redundant Chef infrastructure, you’re often just dancing around the problem of distributing shell scripts, and just trading one administrivia chore for another. To get me to implement new infrastructure, you have to buy me something incredible.

Implementing Vault required I commit to two new pieces of infrastructure: Vault and its Consul backend. That’s steep, in my book, and it’s been incredibly well worth it. I run CoreOS almost exclusively, so running these in Docker on a spare instance was practically trivial. My Consul installation is not (yet) highly available, but that’s a problem for another day.

Overcoming the Allergy

Certificate Authorities are hard. My present deployment routine for CoreOS and Kubernetes involves an EC2 AMI that has the CA’s private key baked in. On boot, the instance uses this CA to generate and sign its own TLS certificates, then deletes the CA’s private key. This works, but has a narrow threat vector in that the incredibly-sensitive CA private key might remain on the filesystem if the boot process fails. Key rotation and certificate revocation are also practically impossible. I accepted this trade-off, expecting to implement a proper CA in the future, in the interests of pushing forward Kubernetes adoption as a priority.

With Vault, I need only distribute a Vault token. I don’t even need to bake it into the AMI, but instead provide it via cloud-init, which is itself computed by AWS CloudFormation. If I need to revoke this token, I need only update a parameter in CloudFormation and my cattle are replaced quickly with instances bearing the new token.

The final piece is that the two tools necessary to fetch certificates are included in CoreOS, despite its lack of LSB userland. All I need are curl and jq.

curl -H 'X-Vault-Token: some-token-here' \
    -d '{"common_name": "some.hostname", "ip_san": "10.10.10.10"}' \
    https://vault.server/etcd/somecluster/pki/issue/member | jq -r \
        .data.certificate > /etc/kubernetes/ssl/server.crt

This is incredibly powerful and genuinely worthy of taking my allergy shots. And it’s why I say I’m allergic to infrastructure. DigitalOcean uses Consul templates plugins to install these certificates, but as they point out:

Because consul-template will only write one file per template and we needed to split our certificate into its components (certificate, private key, and issuing certificate), we wrote a custom plugin that takes in the data, a file path, and an file owner.

There’s no need for that. curl and jq do the job just fine.

Proper Namespacing

My first change from the article was to namespace my Vault endpoints a little differently. DO uses $CLUSTER_ID/pki/$COMPONENT which I find a little confusing. I prefer to go with product/identifier/function – so for my etcd cluster, I go with etcd/$CLUSTER_ID/pki, for kubernetes it’s k8s/$CLUSTER_ID/pki. I explicitly choose to end with pki/ because this makes it clear that this endpoint is the entrypoint for the Vault pki backend; everything from this point is Vault’s doing.

Reconfiguring Etcd

I had existing, unsecured and unauthenticated etcd clusters backing my Kubernetes clusters. While these clusters are managed with CloudFormation and cloud-init to be highly available and replaceable, the data they serve is critically important and cannot be lost. As a result, I began by doing something I rarely do: upgrading the existing servers in-place.

Luckily, the etcd documentation covers this very topic. On each machine, I ran the curl above to generate certificates, then followed the etcd documentation pretty much to the letter.

I’ll note that the biggest problem I had was that the etcd nodes talk to one another directly, but external connections arrive at the nodes through an AWS Elastic Load Balancer. As a result, it is imperative that you set both the common name and the IP SAN on your certificates.

Reconfiguring k8s apiserver

The Kubernetes API server is a little more tricky, as you need to generate two sets of certificates: one to secure its connection to the etcd cluster, and another to secure its connection to the other kubernetes components. Additionally, we never did (and still will not) connect kubernetes directly to the etcd cluster; instead, we’ll configure the local etcd daemon to run in proxy mode, secure its connection upstream, and then have apiserver talk to etcd proxy over localhost. All of this is also covered in the etcd doc above.

Reconfiguring k8s kubelets

While they don’t outright say this, I don’t believe DigitalOcean is using flanneld for their proxy layer. I gather this because they claim that their kubelets aren’t configured at all to talk to etcd, but flanneld requires this. I had to repeat the above configuration on the kubelets so that flanneld would work.

Configuring kubectl

This is the most beautiful thing to me. Previously, I was using my copy of the CA above to generate certificates for all of my users. No more. I just point them to Vault and say “Have at it, mate.”

Kubernetes at Box: All About Being Describable

Sam Ghods, writing on the Box Blog:

This is one key difference between Kubernetes and other orchestration solutions – while most solutions would require you to go to many different systems to manage these pieces (or to write your own glue to tie those systems together), Kubernetes believes that your infrastructure should fundamentally be describable through a set of Kubernetes objects

This might be the best description of the Kubernetes ethos I’ve seen yet, and it’s been my driving force in implementing Kubernetes. I love the idea of describable infrastructure, and I love the idea of disposable infrastructure. Kubernetes gives us both.

Git Protocol Behavior Change At Github.com

We recently saw practically all of our CI tests begin failing if certain tasks attempted to pull private repositories from Github. Specifically, if our Ruby Gemfiles or Go dependencies referenced private repositories, builds would fail with authentication failures.

These errors generally look like this when running bundle install for our project:

Fetching git://github.com/redacted-org/redacted-repo.git
fatal: remote error:
  Repository not found.
Retrying git clone 'git://github.com/redacted-org/redacted-repo.git' "/var/lib/gems/2.1.0/cache/bundler/git/redacted-repo-401a...807f" --bare --no-hardlinks --quiet due to error (2/4): Bundler::Source::Git::GitCommandError Git error: command `git clone 'git://github.com/redacted-org/redacted-repo.git' "/var/lib/gems/2.1.0/cache/bundler/git/redacted-repo-401a...807f" --bare --no-hardlinks --quiet` in directory /tmp has failed.
fatal: remote error:
  Repository not found.

A similar error occurred building a Go project.

The commonality here is that Bundler and godep both recorded the dependencies’ Github URLs as Git protocol. It should be noted that the Git protocol (git://) is unauthenticated and is a pull-only protocol from Github.

Previously, however, these URLs did actually work for authenticated pulls. We were pulling authenticated repositories as late as the evening of July 20, 2016. With no changes to our Ruby environment, our git client, Dockerfile, Gemfile, or Gemfile.lock, these Bundler installs began failing with this error on the 21st.

As a result of this, we engaged in a vicious search-and-destroy to add the following line to all of our build scripts and Dockerfiles:

git config --global url."git@github.com:".insteadOf "git://github.com/"

My working assumption is that, when presented with an attempt to pull a private repository over the Git protocol, Github was silently transforming these pulls into SSH protocol pulls. Sometime overnight between July 20 and July 21, they stopped performing this redirect.

[Update 2016/07/25]: Stacey @ Github Support responded with the following:

We did implement a change recently that corrected our logic to adhere to the correct git protocol.

This means URLs must start with git@github.com.

Thanks for confirming the change, Stacey.

Kubernetes for Developers. No, Really.

I’ve transitioned a lot of my production workload into a kubernetes cluster, and we’re shortly going to begin moving over some of our product too. I’ve been using fairly vanilla Docker for some of my local development toolset, though.

Once upon a time, I ran postgresql natively on my Mac, first by installing the official packages and later with brew install postgresql. Later, I started running my local postgresql in a Docker container. The same was roughly true for all of my other requirements, such as memcached or redis or rabbitmq.

If I just docker run ... these prerequisites, I now have to make sure they’re still running at any given point in time. While moving to Docker brings me a tremendous ease-of-provisioning that I cannot ignore, this aspect is a regression; no longer can I use something like launchd to keep the service running if it fails. I could, and have, used docker-compose to create services in each project that provide the necessary software, but this has often lead me to running two or three copies of, say, postgresql at a time, if I don’t remember to shut them down when I’m done.

And really, that’s more than a little overkill. My local databases aren’t performance tweaked. They aren’t (generally) set up with any custom configuration, and to whatever degree they are, they’re fairly generic tweaks that I apply to all of my databases. For dev, all that really matters is that I happen to have a postgres process running and the software has access to it. Why run three copies?

A few assumptions I make:

  • I prefer my VM to be at 172.31.255.254, as this private subnet is practically never used. By anyone. Ever.
  • ~/src is shared into the VM at ~core/src and, thus, available to containers with the -v flag.
  • Kubernetes services are NodePorts. I replace the thousands digit with 30; that is, the standard postgres 5432 port is comes out on 30432.

Vagrantfile

My Vagrantfile is based off of a great setup by Josh Butts over at offers.com. I added setup for etcd, fleet, and flannel, and added the units for all of the kubernetes components. You can find my setup here. vagrant up and you’re good to go. From Josh’s setup, you can expose your services to the world by adding ports to the services file.

Kubes

The four most common services I use are mysql, postgresql, rabbitmq, and redis. Since I don’t apply any tweaks on my dev box, I’m using the generic, official Docker images.

$ kubectl create -f <service>/rc.yaml
$ kubectl create -f <service>/service.yaml

The services store their state in /home/core/data/<service>. If I wind up with data I need, this is what I back up. You might be tempted to share this back out, but POSIX does stupid things sometimes and, chiefly, I’ve found it really messes with postgresql.

My example kubes are here.

Docker

Because this is based on Josh’s setup, Docker is exposed outside the VM and I can use docker run ... and other commands locally on my Mac with the correct environment, without necessarily having to rely on kubernetes.

$ export DOCKER_HOST=tcp://172.31.255.254:2375
$ docker run ...
    or
$ docker-compose up

Generally, I use docker-compose while developing, then test it by running it in my local kubelet before pushing it out to our production kubelets.

The Only True Shortcut: Do Things Right the First Time

Tiru Srikantha at Playfab:

Most postmortem blog posts are teary affairs explaining how something went wildly wrong and how this will not happen again, but we think postmortems are useful for any major change, whether a success or failure. It’s great to learn from mistakes, but it’s also useful to learn from successes that demonstrate due diligence to the only true shortcut: Do things right the first time.

A Docker Workloads Stack, End to End

We run a lot of background tasks where we don’t necessarily care where the code actually runs: it needs to have Internet access and access to a couple of S3 buckets, and that’s about it. Some of the jobs are appropriate for Hadoop and Spark, which we crank out through AWS ElasticMapReduce, but some is more nuanced.

etcd Backed

To begin with, we back everything with etcd. We use this CloudFormation to bootstrap an etcd cluster according to the CoreOS team’s recommendation for production workloads. We only run etcd on these machines, everyone else just looks to them for configuration data, and these machines do not run any other services.

Operations Tier

The operations tier of our stack provides essential services to the cluster: a Docker registry, Shipyard, and the Vulcand proxy. CloudFormation is used to load this cluster, after which we load fleet units to provide those services.

Vulcand’s documentation is lackluster, to say the least. While the docs give examples of how to set up your front and back ends with etcdctl, I’ve found that for some operations, you’re best using either the REST API or vctl. The Docker registry runs on its host on port 5000 and Shipyard on 8080.

I set up a hostname for the vulcand host, vulcan.int.derr.me, and then set up registry.int.derr.me and shipyard.into.derr.me as CNAMEs.

# ssh to the machine running vulcand, and then enter the vulcan1 container.
$ fleetctl ssh -A -unit=vulcand.service
coreos-host$ docker exec -ti vulcan1 bash
# setup backends
vulcan1$ vctl backend upsert -id registry
vulcan1$ vctl backend upsert -id shipyard
# setup servers behind those backends
vulcan1$ vctl server upsert -b registry -id registry1 -url http://<<registry host IP>>:5000
vulcan1$ vctl server upsert -b shipyard -id shipyard1 -url http://<<shipyard host IP>>:8080
# setup frontends for each
vulcan1$ vctl frontend upsert -id registry -b registry -route 'Host(`registry.int.derr.me`) && PathRegexp(`.*`)'
vulcan1$ vctl frontend upsert -id shipyard -b shipyard -route 'Host(`shipyard.int.derr.me`) && PathRegexp(`.*`)'

Incoming requests are handled by the matching front end. It fields the request to the attached backend, which is served by one or more servers. If we ever decide we’re pulling enough docker images that we should add another Docker server, we would simply add a second server.

vulcan1$ vctl server upsert -b registry -id registry2 -url http://<<registry2 host IP>>:5000

Docker Tier

All other hosts in the cluster are just dumb CoreOS machines running Docker, hooked up to Shipyard.

Embracing Systemd

I’ve been struggling with the vehement opposition to systemd for a while now. I use systemd extensively in my work, but I also manage some systems where I am not using – and do not necessarily want – systemd. I had not put much thought into where the distinction lies, until I read this post by Leszek Urbanski.

[The author of systemd] goes on and on about how you can save 3 seconds here and 5 seconds there by parallel and delayed service startup – systemd actually has a feature to measure system boot time. The question is: who cares? Desktop users, yes. Embedded users, maybe. Server users? Nope. It doesn’t matter if a server comes up in 96.3 seconds instead of 33.1. What counts is if it stays up and is not too cumbersome to maintain.

And that’s where I stopped, because it’s clear to me now where to draw the distinction.

In the past, each Linux system I managed was painfully hand-crafted for its purpose. It was a perfect snowflake, tweaked and tuned and measured and monitored. It was inviolable, permanent, etched in stone. If something went wrong with it, I would spend hours – sometimes days or weeks – tracking down exactly what went wrong, carefully crafting a patch, testing it, and putting it into production. Some common tasks might ultimately make their ways into scripts so they could be automated across many machines, but oftentimes reproducibility really just amounted to keeping a copy of a handful of config files somewhere that they could be found easily.

That is not my work today. As my work has shifted evermore out of the “IT” realm and more into the “DevOps” realm, the number of machines I manage has ballooned from “a small handful” to “hundreds and hundreds.” Certainly, there are still a small handful of servers that I manage the old fashioned way, but this is not – by a wide margin – the bulk of my workload.

The “old way” servers are largely infrastructure servers such as databases, DNS, and VPN servers. These are my snowflakes, and they are few and far between. I don’t necessarily care if they run initd or systemd, but they all happen to run initd. This mostly comes down to the fact that the operating systems of each uses initd (though some are transitioning to systemd). I know bash and know it well, and these software packages support bash init scripts, provide default bash init scripts, and they just work. Right out of the bag, they just work.

These snowflakes are scarce, though. For every snowflake, ever machine that just boots once and (hopefully) runs into the forever, I may have a hundred completely disposable worker instances. These are web servers and task runners, actual workhorses that get stuff done. And every single one of them is exactly like the other. If the infrastructure instances are snowflakes, each unique and distinguishable and perfect, then the worker instances are like grains of sand – each completely replaceable, fungible, indistinguishable from the previous.

Systemd is for these grains of sand.

I don’t waste time troubleshooting workers. I don’t clean them up. I don’t worry about why one is not working. I don’t care, because all I have to do is terminate that instance and boot a new one, and suddenly all is right with the world. With tools like chef or puppet or ansible or saltstack, my configurations are perfectly repeatable. Idempotent. Deterministic. These instances house no data on themselves that must be preserved. There is nothing about them that is special in any way, at least as compared to their brethren in the the Autoscaling Group. Kill them all, I say, and let God know his own.

And that is when I care that it only takes 33 seconds instead of 96 seconds, because that’s 63 seconds of productivity in my cloud that I’ve gained. That’s 63 seconds of users not complaining about a slow website, dropped requests, failed API calls. That’s 63 seconds of New Relic not complaining about my Appdex score, that’s 63 seconds I get back towards my 99.999% uptime guarantee.

It is for these machines that systemd is a godsend, and I embrace it fully.

CoreOS Paves a New Containerized Path

The CoreOS team today announced the first release of Rocket. As I’ve clearly gone swimming in the Docker Kool-aid, I’ve also knocked back a goodly amount of CoreOS’s as well.

As I read above-linked blog post from the CoreOS team, I found myself agreeing with them more than I expected to. I’m a big fan of Fig, am quite pleased that Orchard was acquired by Docker and that development will continue, but the news that fig would henceforth be built in to Docker itself was a little troubling.

The common Unix-ism is to “do one thing and do it well.” In fact, the entire idea behind containers is that they should do one thing, and do it well. Our production stack has one specific container that runs our app, another that runs nginx, and one last one that handles service discovery. Each container does one thing. Hopefully, it does it well. When it does not, I just have to investigate that one container.

Docker is diverging from that maxim. /usr/bin/docker is your client for interacting with the Docker Hub. It’s your tool for managing running containers. It’s your tool for managing your local library of images. It’s your tool for building new images. Soon, it will be your tool managing entire stacks of containers monolithically and your tool for deploying to production.

Some of those can arguably be considered part of one same thing that it is doing well, but the path is clearly charted in the other direction. Rocket is designed to counter this an offer an alternative, swinging the pendulum back towards the Way of Unix.

I feel like perhaps I’m starting to swing towards being more beholden to CoreOS than specifically to Docker itself.

Letting Git Bisect Help You

Rodrigo Flores:

With this test in hand, you can start bisecting your code. Remember you must have two commits: one where you’re sure the bug exists and one where you’re sure that the bug does not exist. You can use hashes and tags to point to these commits. When you bisect you mark a commit as good (the test passes) or bad (the test fails).

I’ve fallen out of the habit of using git bisect. I can’t even tell you why, because it’s that awesome. After being reminded of it, I’m going to make an effort to use it more frequently.

Using Docker in Production

I aggressively switched Djed Studios to Docker earlier this year and couldn’t be happier with the result. It has been prominently front-and-center in our culture of continuous improvement and “failing fast”. Initially, I recreated the exact, pre-existing development environment into a monolithic Docker container. I then slowly evolved the workflow such that we now do everything “The Docker Way” – we have 1 service per container and build our service up atomically from there. The very same containers that we use for development are, ultimately, placed onto our production servers.

Python Onbuild

The official python onbuild containers proved invaluable. For a few reasons, though, I have forked them for our own purposes. You can see the base of this fork on github. The only notable change is that it is now based on ubuntu instead of debian.

Our private python-onbuild containers use this change to install a handful of apt packages that are integral to our site. In keeping with the purpose of onbuild containers, I wanted this container to be everything needed to build or run our app except for the actual python environment and code.

This gives us a really simple Dockerfile:

FROM jcderr/python:2.7.8-onbuild

RUN ln -s /usr/src/app /opt/app; mkdir /opt/logs
RUN mkdir -p /usr/src/app/static && python manage.py collectstatic --noinput --clear
VOLUME [ "/usr/src/app", "/var/run" ]
CMD [ "python", "manage.py", "runserver", "0.0.0.0:80"]

The ‘onbuild’ part of this construct is that the base image uses the ONBUILD declaration to grab our requirements.txt file and install a full python environment, and then injects our code into /usr/src/app. By doing this is two stages, building the python environment can utilize Docker’s build cache if requirements.txt hasn’t changed and results in a much faster build.

Fig

Fig 1.0 was released simultaneously with Docker 1.3. This was a key point for us, as it allowed us to retire Soos, my custom-designed shell glue for our developers’ environments. I had reviewed Fig at the time and found it lacking, but the fine folks at Orchard (and at Docker, after the acquisition) have done an incredible job in rapid iteration on the concept and produced an outrageously amazing product. The plan is to integrate it fully into a future release of Docker itself and I wholeheartedly endorse this effort. Fig is that awesome.

A shadow of Soos still persists in our project, mostly as a wrapper for common tasks in Fig, and we have two figfiles, fig.yml and fig-production.yml. By default, all bin/soos calls use fig.yml, which stands up a db, redis, and a Django dev server on port 8000, with all relevant links.

If someone calls bin/soos --prod [action], however, it will use the production figfile. For instance, bin/soos --prod up will stand up a copy of the entire production stack locally, including Nginx with SSL, a celery worker, and so on.

Production

We deploy entirely within AWS with CloudFormation, and all of our instances are CoreOS Stable machines. Each instance has EC2 metadata to define what environment it belongs to (prod, stage, or test) as well as its role (django, celery, etc).

Sharing Secrets

Securing and sharing secrets is one of the hard problems in operations. We chose to store these secrets in an S3 bucket and allow access via IAM profiles. Each environment has a matching environment file that defines a handful of necessary values, such as DATABASE_URL, in shell syntax.

systemd

systemd isn’t for everyone, but it’s the init system in CoreOS, it gets the job done, and it solves my specific problems quite well, so I’ve embraced it wholeheartedly in our stack. We have global units that target all units that match specific metadata (eg. ‘role=django’, ‘role=celery’).

I abuse subshells pretty aggressively, though. One of our units is a fork of the CoreOS team’s example ELB Presence Notifier, altered to use IAM profiles instead of requiring access keys in the environment.

[Unit]
Description=ELB Presence Notifier
After=http.service
BindsTo=http.service

[Service]
User=core
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker stop elb-presence
ExecStartPre=-/usr/bin/docker rm elb-presence
ExecStartPre=/usr/bin/docker pull jcderr/s3fetch
ExecStartPre=-/usr/bin/docker run --rm -v /home/core:/root jcderr/s3fetch python /opt/app/s3fetch.py ************* .docker.cfg /root/.dockercfg
ExecStartPre=-/bin/sh -c '/usr/bin/docker run --rm -v /home/core:/root jcderr/s3fetch python /opt/app/s3fetch.py ************* $(/usr/bin/etcdctl get /djed-environment) /root/$(/usr/bin/etcdctl get /djed-environment)'
ExecStartPre=/usr/bin/docker pull djedstudios/elb-presence
ExecStart=/bin/sh -c '/usr/bin/docker run --name elb-presence --env-file=/home/core/$(/usr/bin/etcdctl get /djed-environment) djedstudios/elb-presence'
ExecStop=/usr/bin/docker stop elb-presence

[X-Fleet]
Global=true
MachineMetadata="role=django"

This service adds the instance to an AWS Elastic Load Balancer and, when the linked http.service process dies, remove it again. Process termination is low hanging fruit. There’s no point in waiting for the AWS health checker to take its time removing the instance if we know the process isn’t even running.

As you can see, I have a key I fetch out of etcd to make this work. djed-environment holds the full path to the environment file we fetched from S3. Other units also rely on djed-version, which will be the tag name for the docker container that should be run.

With similarly constructed systemd units, I stand up the service like this:

  • Fetch the app container
  • Run a copy of the app to serve the website
  • Run another copy to serve the API
  • Both run uwsgi on a Unix socket
  • Fetch and run the nginx container
  • Nginx picks up the app and API sockets as upstreams
  • Fetch and run the presence notifier

We’re a Django shop. The only difference between the app container and the api container is that we pass in a different DJANGO_SETTINGS_MODULE to the api container.

Deploying Updates

To deploy a new version of the code, I simply update the etcd key /djed-version. We have a self-healing/self-updating service called Norm that watches this key and, upon a change, begins the update process.

Norm

Norm is a short bash script that has two modes of operation: maintenance and update.

In Maintenance Mode, Norm watches the fleet service to make sure that all systemd units in the stack are healthy. If they fail, it works through a regimen of stopping, unloading, and then relaunching said services.

In Update mode, one machine is chosen at random to be the ‘responsible adult’ and the others are instructed to pull the targeted version of the container (as defined by the /djed-version key). Once each machine concludes the download, they inform the responsible adult by updating a key that includes their machine ID. Once the entire fleet has the new container, the responsible adult stops the service, performs all maintenance required in the upgrade (eg. database migrations, static file generation), and then brings the service back up.

In the near future, we’ll be moving to a database migration scheme that will allow this update to take place without downtime, but we’re not quite there yet.

Baseless “Copyright” Complaints: OrlandoEscape.com

My dad and I run a community forum over at Rad Rods Rule, a gathering place for hot rodders, rat rodders, petrolheads, and more. The site gets enough traffic that we occasionally have to chat with our hosting provider about all manner of issue, but otherwise it’s possibly one of the lowest maintenance websites I’ve ever had the pleasure of running. Great bunch of guys.

I’ve gotten a handful of entirely spurious abuse complaints against RRR in the last year that leave me feeling that the only real recourse is to bring about the public shaming. Today’s baseless claim comes from Sunil Govind at OrlandoEscape.com, who apparently feels it’s my responsibility to safeguard his Google PageRank rankings.

Mr. Govind contacted my hosting provider with the following email:

Re: Copyright Claim

I work for the company: ORLANDOESCAPE.COM

Note: All correspondence regarding this matter should be via
sunil@orlandoescape.com

Our site has been penalized by Google because of duplicated content which
is hosted on domains within your control, as result of this we have lost
80% of our Google traffic and we need to get this content taken down ASAP!

HOW THIS AFFECTS YOU?

By knowingly linking to copyright infringing content, you could be liable
for ‘contributory copyright infringement’ see
http://www.dmlp.org/legal-guide/linking-copyrighted-materials for more
information!

We have isolated the pages causing our penalty which we need to have
removed,

Where Exactly Are These Links And Content Which Need Removing?

Specifically the following pages which are most likely operated by the same
person and located here:

http://ratrodsrule.com/forum/showthread.php?p=42598

On the following IPs:
192.241.212.51

I have a good faith belief that use of the copyrighted
materials described
above as allegedly infringing is not authorized by the copyright owner, its
agent, or the law and is being hosted by DIGITALOCEAN-6.

I swear, under penalty of perjury, that the information in the notification
is accurate and that I am the copyright owner or am authorized to act on
behalf of the owner of an exclusive right that is allegedly infringed.

Kindest Regards
Sunil Govind
http://www.ORLANDOESCAPE.COM
sunil@orlandoescape.com

If you visit the thread linked on RatRodsRule, you’ll notice that the “linked infringing content” is, in fact, a link directly to his own website. He’s alleging here that his own website is infringing his own copyright, and that we’re in violation for linking to his own infringement of his own content. It’s obvious, of course, that what he’s really concerned with is how our link to his site affects his PageRank.

Thankfully, there’s no provision of law, copyright or otherwise, which says that I am legally liable for any effects on his PageRank.

Nginx ONBUILD FTW

I needed a lightweight nginx setup that I just throw config files into, thus was born docker-nginx-onbuild. The base container will add any files in the context to /etc/nginx and your CMD need only be arguments to nginx.

To build our nginx container here at Djed, my docker context has a few files:

  • conf.d/site.conf
  • ssl/certificate.crt
  • ssl/certificate.key

My Dockerfile is very simple and looks like this:

FROM jcderr/nginx-onbuild:latest
MAINTAINER Jeremy Derr my@email
CMD [ "-g", "daemon off;" ]

site.conf defines my upstreams (which are sockets exported by other containers) and my hosts, and that’s about it.

Link

Jon Hendren:

In the olden times you had your system admin people and your developers separate. [ … ] The reason this was previously a problem is because everyone is extremely used to staring at their computer all day and zoning out and thinking about orcs or elves or whatever Lord of the Rings stuff was popular at the time instead of talking to their coworker friends about work.

DevOps is mostly just a way of reorganizing all your nerds and making them talk to each other more.

Hear, hear!

Introducing Soos – A Docker Workflow Template for Devs

I was recently handed the devops reins over at Djed Studios, I was starting from a fresh slate and ready to do something cool. With all of the hype surrounding Docker, I was especially eager to give it a shot.

I reached out to Flux7’s Aater Suleman, who spoke at last month’s Devops Days Austin on “Using Docker to Improve Web Developer Productivity.” Unfortunately, the timing of my assumption of devops responsibilities did not give me enough lead time to get into Devops Days, but Aater was kind enough to meet me at a local Starbucks to discuss his presentation. I am forever in debt to the kindness he showed me in this.

Our Stack

Our development stack is fairly standard. MacBook Pros, VirtualBox, Vagrant, Django, and postgresql. Initially, we had rolled with one custom vagrant box (“djedbox”) which was monolithically built using the same SaltStack states that built our production website. This worked, but was very time intensive, and did not give us the flexibility of rebuilding fractions of the development environment.

Want to drop and rebuild the database, with test data, without having to rebuild the python virtual environment too? Well, too bad: our states were so intertwined as to make this infeasible. I hope you are a seasoned postgres admin, because that is what it took. Not all of our team is that comfortable with the unix command line, though, much less with the intricacies of the postgres dropdb and createdb tools, granting privileges, and so on. Initially, we wrapped some of these things in shell scripts, but the scripts had to evolve in sync with the salt states and it turned into a nightmare.

Docker

I threw pretty much everything away when we moved to docker. Salt went right out the window; if I was going to make this move, it was going to be docker on the dev, docker in testing, docker in production. End to end docker. The only difference between our dev environments and our production is that the database in production is its own instance, whereas in development it is another container.

Our devs regularly want to revert back to a known environment, something they can do by axing their current container and starting a new one from a known-good image. The first time a develop runs our scripts, they get a local image from which to base their work and they can quickly and painlessly return to this at any time, regardless of what they do to their currently running container. This is the big win here. Everything else simple serves this goal.

ENTER SOOS

Soos is the gateway into dealing with docker. It is a collection of bash scripts that manage which containers are running, when, and how. It builds new images, runs, stops, and starts existing containers. Launches single-use containers to run tests. Launches long-lived containers for the database and a master web app instance.

I consider Soos more of a template than a drop-in solution. Everyone’s needs will be different; perhaps you’re using Rails or Node instead of Django, or some variety of NoSQL. The publicly released Soos uses sqlite, whereas internally we’re using postres (in the near future, I will share our scripts for the db image).

Getting Started

  • bin/soos up

This is how you get started. soos-up will check to see if you have all the requirements installed and, if not, give you the opportunity to do so. It’ll boot a VirtualBox VM via Vagrant, install the docker provisioner, and pull the ubuntu:14.04 image that we use as our base.

Once this is done, you can visit http://10.1.2.4/ and see your django site. The django development server is in use here, meaning your changes to .py files will be reflected in real time. If you make changes to the db, you can run bin/soos migrate and it will syncdb. If you use south, or something similar, you can edit this to run that as well.

You will see that there are docker scripts in bin/docker. The main soos-* scripts are to be run by the user, via bin/soos [command], but the scripts in bin/docker are explicitly for running inside of a container. Generally, these should only be run by the zoos command.

Dockerfiles

We have a handful of docker files in our repository. The root Dockerfile is intended for AWS Beanstalk. Soos instead barters with the docker files found within support/dockerfiles. bin/soos build defaults to building at support/dockerfiles/app/ (and copies in an requirements.txt file, if found). While there’s no docker files for these, you can see that it could support, for instance, building a db server with bin/soos build --db.

Fork Me!

Please feel free to fork me and send PRs for any ideas you have in how this would help you do your job.

Link

Fairfield University psychologist Linda Henkel:

You’re just kind of mentally discounting it–thinking, ‘Well, the camera’s got it.’

I’ve noticed this in my own work, and that’s part of the reason I’ve neglected my photography for the last few years. I broke my Nikon a while back and, while waiting for its repair, noticed that I was enjoying travel a lot better when I wasn’t constantly looking at life like a photographer. I took trip to Paris a while back, and all I really remember from it now are what I see in the pictures I took; I don’t want that to happen to my time with my son while he grows up.

Panning the Winter Classic

Yesterday’s snow at the Winter Classic produced slow-paced hockey. It slowed down skaters, it slowed down the puck, it interfered with passing, and clearly disrupted many players’ timing. The game looked more like mid-level beer-league hockey than what we’re accustomed to seeing at the NHL level. Outdoor hockey just doesn’t produce terribly good hockey.

Don’t get me wrong. I find the idea behind the Winter Classic to be amazingly romantic and just oodles of fun. It harkens back to how hockey players learn how to play hockey: on the pond. It goes back to when the NHL’s best and greatest were mere tykes, lacing up on park benches, learning their sport on frozen ponds while their parents watch from the sidelines shivering to stay warm, drinking hot chocolate or coffee from a thermos. Hockey in its purest form.

It’s just not the game we should expect from the NHL.

Link

Chuck Gormley reports for CSNWashington:

Capitals general manager George McPhee said he thought Tom Wilson’s second period hit on Flyers center Brayden Schenn was “a great hit” and that Schenn was “too slow” to get out of the way.

Adam Oates added:

He hit him hard, yeah. To me, it’s a clean hit. I don’t think it’s a penalty at all.

Here’s the hit in question. It’s very clearly Boarding and Charging, and potentially Checking from Behind, all rolled into one. It’s not a clean hit, it’s not a good play, and Oates, McPhee, and other Caps seeking to excuse Wilson should be ashamed of themselves.

I can buy that maybe Checking from Behind isn’t a perfect fit, per the rule:

When a player intentionally turns his body to create contact with his back, no penalty shall be assessed.

However, the onus is on Wilson to deliver a legal body check, and he’s charging far too fast, especially given the proximity to the boards, to adjust to any changes in direction or facing by Schenn.

Wilson faces a phone interview with the DPS, so he won’t be suspended for more than 5 games. And that’s just a shame; this is exactly the kind of play that the NHL should be removing from the game entirely.

Link

Excellent roundup of Git best practices. If you’re doing it right, you should be doing something akin to these examples.

Link

One of my guilty pleasures is to follow crazy civil lawsuits. For the last year, I’ve been following the Prenda Saga, where an allegedly sleazy law firm (“Prenda”) allegedly stole someone’s identity to secure porn copyright assignments and then allegedly shake-down improperly-identified random internet users for thousands of dollars to keep their names out of court records. Suffice to say, Federal Courts across the country have not taken their shenanigans lightly.

As Prenda winds down, a new guilty pleasure arrives in the form of Palmer v KlearGear.com. The short version is that Palmer allegedly posted a negative review of KlearGear, so KlearGear allegedly sent a demand letter for $3500 while citing a more-than-a-little-sketchy clause in their Terms of Use that restrain any users from posting negative reviews. KlearGear then trashed Palmer’s credit rating and hired a debt collector to go after them. The kicker?

Apparently, the no-negative-reviews clause wasn’t even in their ToS at the time Palmer used the site.

In addition to declaratory relief, the claims against KlearGear include violations of the federal Fair Credit Reporting Act, defamation, intentional interference with economic relations, and intentional infliction of emotional distress.

KlearGear refused to respond to Palmer’s requests to repair the situation. I don’t see this ending well for them. Public Citizen is on the case.

The NHL Has to Change

The NHL has been engaged in a battle against dangerous play for some time now, especially as the danger of repetitive head injuries has come to the forefront of sports consciousness. Brendan Shanahan has been doling out suspensions like candy, and it looks like he’ll be handing out two more just last night.

There’s a lot of argument about enforcers and fighting, and their role in cracking down on dangerous play. Shawn Thornton has famously opined about “The Code,” almost to the point of considering it to be a touch of chivalry in its own right. Thornton developed a bit of a credibility problem over the course of the last week, though.

Hockey is a beautiful game. Its penchant for bruising hits, its speed, its finesse, make it popular the world over – even where snow isn’t particularly common. The dangerous play we see in the NHL though, isn’t universal; USA Hockey, Hockey Canada, and the International Ice Hockey Federation rules all come down harshly on punitive hits and intimidation. In part due to the Olympic Ethos, there is no fighting in Olympic hockey.

Over the last several years, USA Hockey has instituted a “Standards of Play” initiative. This initiative has a defined goal of removing intimidation from the game, and bring a focus back to skating and stick handling skills. A number of penalties now carry automatic 10-minute Misconduct (or even Game Misconduct) tack-ons because they are always aggressor penalties, showing either a blatant disregard for safety or an attempt to intimidate an opponent.

That does not make it a lesser game, and the NHL needs to follow suit.