Embracing Systemd

I’ve been struggling with the vehement opposition to systemd for a while now. I use systemd extensively in my work, but I also manage some systems where I am not using – and do not necessarily want – systemd. I had not put much thought into where the distinction lies, until I read this post by Leszek Urbanski.

[The author of systemd] goes on and on about how you can save 3 seconds here and 5 seconds there by parallel and delayed service startup – systemd actually has a feature to measure system boot time. The question is: who cares? Desktop users, yes. Embedded users, maybe. Server users? Nope. It doesn’t matter if a server comes up in 96.3 seconds instead of 33.1. What counts is if it stays up and is not too cumbersome to maintain.

And that’s where I stopped, because it’s clear to me now where to draw the distinction.

In the past, each Linux system I managed was painfully hand-crafted for its purpose. It was a perfect snowflake, tweaked and tuned and measured and monitored. It was inviolable, permanent, etched in stone. If something went wrong with it, I would spend hours – sometimes days or weeks – tracking down exactly what went wrong, carefully crafting a patch, testing it, and putting it into production. Some common tasks might ultimately make their ways into scripts so they could be automated across many machines, but oftentimes reproducibility really just amounted to keeping a copy of a handful of config files somewhere that they could be found easily.

That is not my work today. As my work has shifted evermore out of the “IT” realm and more into the “DevOps” realm, the number of machines I manage has ballooned from “a small handful” to “hundreds and hundreds.” Certainly, there are still a small handful of servers that I manage the old fashioned way, but this is not – by a wide margin – the bulk of my workload.

The “old way” servers are largely infrastructure servers such as databases, DNS, and VPN servers. These are my snowflakes, and they are few and far between. I don’t necessarily care if they run initd or systemd, but they all happen to run initd. This mostly comes down to the fact that the operating systems of each uses initd (though some are transitioning to systemd). I know bash and know it well, and these software packages support bash init scripts, provide default bash init scripts, and they just work. Right out of the bag, they just work.

These snowflakes are scarce, though. For every snowflake, ever machine that just boots once and (hopefully) runs into the forever, I may have a hundred completely disposable worker instances. These are web servers and task runners, actual workhorses that get stuff done. And every single one of them is exactly like the other. If the infrastructure instances are snowflakes, each unique and distinguishable and perfect, then the worker instances are like grains of sand – each completely replaceable, fungible, indistinguishable from the previous.

Systemd is for these grains of sand.

I don’t waste time troubleshooting workers. I don’t clean them up. I don’t worry about why one is not working. I don’t care, because all I have to do is terminate that instance and boot a new one, and suddenly all is right with the world. With tools like chef or puppet or ansible or saltstack, my configurations are perfectly repeatable. Idempotent. Deterministic. These instances house no data on themselves that must be preserved. There is nothing about them that is special in any way, at least as compared to their brethren in the the Autoscaling Group. Kill them all, I say, and let God know his own.

And that is when I care that it only takes 33 seconds instead of 96 seconds, because that’s 63 seconds of productivity in my cloud that I’ve gained. That’s 63 seconds of users not complaining about a slow website, dropped requests, failed API calls. That’s 63 seconds of New Relic not complaining about my Appdex score, that’s 63 seconds I get back towards my 99.999% uptime guarantee.

It is for these machines that systemd is a godsend, and I embrace it fully.