NixOS and Stateless Deployment
If I had my way, I would never deploy or administer a linux server that isn’t running NixOS.
I’m not exactly a prolific sysadmin - in my time, I’ve set up and administered servers numbering in the low tens. And yet every single time, it’s awful.
Firstly, you get out of the notion of doing anything manually, ever. Anytime you do something manually you create a unique snowflake, and then 3 weeks (or 3 years!) down the track you tear your hair out trying to recreate whatever seemingly-unimportant thing it is you did last time that must have made it work.
So you learn about automated deployment. There are no shortage of tools, and they’re mostly pretty similar. I’ve personally used these, and learned about many more in my quest not to have an awful deployment experience:
All of these work more or less as advertised, but all of them still leave me with a pretty crappy deployment experience.
The problem
Most of those are imperative, in that they boil down to a list of steps - “install X”, “upload file A -> B”, etc. This is the obvious approach to automating deployment, kind of like a shell script is the obvious approach to automating a process. It takes what you currently do, and turns it into one or more concrete files that you can modify and replay later.
And obviously, the entire problem of server deployment is deeply stateful - your server is quite literally a state machine, and each deployment attempts to modify its current state into (hopefully) the expected target state.
Unfortunately, in such a system it can be difficult to predict how the current state will interact with your deployment scripts. Performing the same deployment to two servers that started in different states can have drastically different results. Usually one of them failing.
Puppet is a little different, in that you don’t specify what you want to happen, but rather the desired state. Instead of writing down the steps required to install the package foo
, you simply state that you want foo
to be installed, and puppet knows what to do to get the current system (whatever its state) into the state you asked for.
Which would be great, if it weren’t a pretty big lie.
The thing is, it’s a fool’s errand to try and specify your system state in puppet. Puppet is built on traditional linux (and even windows) systems, with their stateful package managers and their stateful file systems and their stateful user management and their stateful configuration directories, and… well, you get the idea. There are plenty of places for state to hide, and puppet barely scratches the surface.
If you deploy a puppet configuration that specifies “package foo must be installed”, but then you remove that line from your config at time t, what happens? Well, now any servers deployed before t will have foo
installed, but new servers (after t) will not. You did nothing wrong, it’s just that puppet’s declarative approach is only a thin veneer over an inherently stateful system.
To correctly use puppet, you would have to specify not only what you do want to be true about a system, but also all of the possible things that you do not want to be true about a system. This includes any package that may have ever been installed, any file that may have ever been created, any users or groups that may have ever been created, etc. And if you miss any of that, well, don’t worry. You’ll find out when it breaks something.
So servers are deeply stateful. And deployment is typically imperative. This is clearly a bad mix for something that you want to be as reproducible and reliable as possible.
Puppet tries to fix the “imperative” part of deployment, but can’t really do anything about the statefulness of its hosts. Can we do better?
Well, yeah.