GFX::Monk Home

NixOS and Stateless Deployment

If I had my way, I would never deploy or administer a linux server that isn’t running NixOS.

I’m not exactly a prolific sysadmin - in my time, I’ve set up and administered servers numbering in the low tens. And yet every single time, it’s awful.

Firstly, you get out of the notion of doing anything manually, ever. Anytime you do something manually you create a unique snowflake, and then 3 weeks (or 3 years!) down the track you tear your hair out trying to recreate whatever seemingly-unimportant thing it is you did last time that must have made it work.

So you learn about automated deployment. There are no shortage of tools, and they’re mostly pretty similar. I’ve personally used these, and learned about many more in my quest not to have an awful deployment experience:

All of these work more or less as advertised, but all of them still leave me with a pretty crappy deployment experience.

The problem

Most of those are imperative, in that they boil down to a list of steps - “install X”, “upload file A -> B”, etc. This is the obvious approach to automating deployment, kind of like a shell script is the obvious approach to automating a process. It takes what you currently do, and turns it into one or more concrete files that you can modify and replay later.

And obviously, the entire problem of server deployment is deeply stateful - your server is quite literally a state machine, and each deployment attempts to modify its current state into (hopefully) the expected target state.

Unfortunately, in such a system it can be difficult to predict how the current state will interact with your deployment scripts. Performing the same deployment to two servers that started in different states can have drastically different results. Usually one of them failing.

Puppet is a little different, in that you don’t specify what you want to happen, but rather the desired state. Instead of writing down the steps required to install the package foo, you simply state that you want foo to be installed, and puppet knows what to do to get the current system (whatever its state) into the state you asked for.

Which would be great, if it weren’t a pretty big lie.

The thing is, it’s a fool’s errand to try and specify your system state in puppet. Puppet is built on traditional linux (and even windows) systems, with their stateful package managers and their stateful file systems and their stateful user management and their stateful configuration directories, and… well, you get the idea. There are plenty of places for state to hide, and puppet barely scratches the surface.

If you deploy a puppet configuration that specifies “package foo must be installed”, but then you remove that line from your config at time t, what happens? Well, now any servers deployed before t will have foo installed, but new servers (after t) will not. You did nothing wrong, it’s just that puppet’s declarative approach is only a thin veneer over an inherently stateful system.

To correctly use puppet, you would have to specify not only what you do want to be true about a system, but also all of the possible things that you do not want to be true about a system. This includes any package that may have ever been installed, any file that may have ever been created, any users or groups that may have ever been created, etc. And if you miss any of that, well, don’t worry. You’ll find out when it breaks something.

So servers are deeply stateful. And deployment is typically imperative. This is clearly a bad mix for something that you want to be as reproducible and reliable as possible.

Puppet tries to fix the “imperative” part of deployment, but can’t really do anything about the statefulness of its hosts. Can we do better?

Well, yeah.

Escaping an array of command-line arguments in C#

Let’s say you have an array of strings:

args = [ "arg1", "an argument with whitespace", 'even some "quotes"']

..and you want to pass them to a command, exactly as is. You don’t want it split on spaces, you don’t want quotes to disappear. You just want to pass exactly these strings to the command you’re running. In python, you would do something like:

subprocess.check_call(["echo"] + args)

In low-level C, it’s more effort, but it’s not really harder - you just use the execv* family of system calls, which takes an array of strings. At least on a UNIX-like OS.

But what if you’re using C# on Windows? Then it’s going to cost you a veritable screenful of code if you want to not screw it up. And you’ll probably screw it up. The internet has plenty of examples that happen to work well enough for simple data. But then they break when you add spaces, or double quotes, or backslashes, or multiple backslashes followed by a double quote. You don’t want that code. You want this code.

I’m honestly floored that nobody has published this code anywhere before (that I could find). So with the firm belief that it’s insane for anybody to have to implement this ridiculous escaping scheme for themselves, here it is:

Doing stuff when files change

There’s a common pattern in development tools to help with rapid feedback: you run a long-lived process that does a certain task. At the same time, it watches the filesystem and restarts or re-runs that task whenever a file that you’re monitoring changes.

This is an extremely useful tool for rapid feedback (which is why we’ve integrated nodemon into our Conductance app server), but is not very flexible - most tools are integrated into a web framework or other environment, and can’t easily be used outside of it. There are a few generic tools to do this kind of thing - I personally use watchdog a lot, but it’s sucky in various ways:

  • Configuring it to watch the right file types is hard
  • Configuring it to ignore “junk” files is hard, leading to infinite feedback loops if you get it wrong
  • It sees about 6 events from a single “save file” action, and then insists on running my build script 6 times in a row
  • It takes a bash string, rather than a list of arguments - so you have to deal with double-escaping all your special characters

And yet for all of those issues, I haven’t found a better tool that does what I need.

My build workflow

Lately, I’ve been making heavy use of my relatively-new build system, gup. It’s a bit like make, but way better in my humble-and-totally-biased-opinion. But going to a terminal window and typing up, enter (or the wrist-saving alternative crtl-p, ctrl-m) to rebuild things is tedious. But there’s no way I’m going to implement yet another watch-the-filesystem-and-then-re-run-something gup-specific tool, at least not until the lazy alternatives have been exhausted.

Obviously, my workflow isn’t just up, enter. It’s (frequently):

  • save file in vim
  • go to terminal
  • press up, enter
  • go to browser
  • refresh

And you know what? Monitoring every file is kind of dumb for this workflow. I don’t have gremlins running around changing files in my working tree at random (I hope), I almost always want to reload in response to me changing a file (with vim, of course). So why not just cooperate?

The simple fix

So I’ve written a stupid-dumb vim plugin, and a stupid-dumb python script. The vim plugin touches a file in $XDG_USER_DIR whenever vim saves a file. And then the script monitors just this file, and does whatever you told it to do each time the file is modified. The script automatically communicates with vim to enable / disable the plugin as needed, so it has no overhead when you’re not using it.

It’s called vim-watch, and I just gave you the link.

Addendum: restarting servers

While writing this post, I was a little disappointed that it still doesn’t quite replace tools that automatically restart a server when something changes, because it expects to run a build-style command that exits when it’s done - but servers run forever. Some unix daemons (like apache) restart themselves when you send them a HUP signal, but that’s not so common in app servers. So now huppy exists, too.

It’s a tiny utility that’ll run whatever long-lived process you tell it to, and when it receives a HUP signal it’ll kill that process (with SIGINT) if it’s still running, then restart it. It seems surprising that this didn’t exist before (maybe my google-fu is failing me) but on the other hand it’s less than 60 lines of code - hardly an expensive wheel to reinvent.

You can use it like:

$ # step 1: start your server
$ huppy run-my-server

$ # step 2: use vim-watch to reload the server on changes
$ vim-watch killall -HUP huppy

$ # Or, if you need to rebuild stuff before restarting the server,
$ vim-watch bash -c 'gup && killall -HUP huppy'

Oni Conductance

This past week, we (Oni Labs) announced Conductance, the next-generation web app server built on the StratifiedJS language (which we also built, and which has seen a number of steadily improving public releases over the past couple of years).

For a long time, I’ve been convinced that plan JavaScript is simply inappropriate for building large scale, reliable applications. That doesn’t mean it’s impossible, but the effort required to correctly write a heavily-asynchronous application in javascript involves a frankly exhausting amount of careful error checking and orchestration, and there are whole classes of confusing bugs you can get into with callback-based code which should not be possible in a well-structured language.

So I was extremely happy to join the Oni Labs team to help work on StratifiedJS, because it’s a much more extensive (and impressive) attempt to solve the same problems with asynchronous JavaScript that I was already trying to solve.

Conductance is a logical progression of this work: now that we have StratifiedJS, we’ve used its features to build a new kind of app server: one which maintains all the performance benefits of asynchronous JavaScript (it’s built on nodejs, after all), but which makes full use of the structured concurrency provided by StratifiedJS for both server and client-side code. And not just for nice, modular code with straightforward error handling - but actually new functionality, which would be impossible or extremely ungainly to achieve with normal JavaScript.

If you’re interested in building web apps (whether you already do, or would like to start), please do check out for more details, and plenty of resources to help you get started building Conductance applications.

direnv: Convenient project-specific environments

I’m pretty particular about my development tools, and I really dislike any tool that requires careful curation of global state - e.g. ruby gems, python packages, etc. In recent years, things have gotten better. Ruby has bundler, which keeps a project’s dependencies locally (avoiding any global state). Similarly, python has virtualenv, which does much the same thing. Tools like rvm and nvm allow you to manage multiple versions of the language itself. Notably, the npm package manager for nodejs fully embraces local dependencies - by default, packages are always installed locally (although the implementation itself is not particularly sane).

The inconvenience with most of these is that they require the developer to do something to “get into” a certain environment - if you try to run your python project’s tests without first activating the correct virtualenv, things will fail pretty badly. Some tools (e.g rvm) include shell hooks to automatically activate the appropriate environment when you change directories, but they are tool-specific - you’ll need to add a hook in your shell for each tool you use, and I have my doubts that they would cooperate well since they do awful things like overriding the cd command.

Enter direnv

I was very excited to find out about direnv (github: zimbatm/direnv) a few weeks ago, because I had just been looking for exactly such a tool, and considering writing one myself (I’m rather glad I didn’t have to). The idea is simple: extract all the messy stuff that rvm, virtualenv, etc do to manage per-directory environment variables, and put it into a single, general-purpose tool. You place an .envrc script in the root directory of your project, and you can use whatever tools you need to inside that script to set project-specific environment variables (via export statements, or by delegating to bundler, virtualenv, etc). direnv takes care of sandboxing these modifications so that all changes are reversed when you leave the project directory.

Aside from relieving other tools of the arduous work of reimplementing this particular wheel (including individual integration with each shell), direnv is much more extensible than existing language-specific tools - you can (for example) also export credentials like AWS_ACCESS_KEY, or add project-specific scripts to your $PATH so you can just run mk, rather than having to invoke an explicit path like ./tools/mk.

Of course, few tools get my blessing these days if they don’t play well with ZeroInstall (if I had my way, all of rvm/virtualenv/npm/pip would be replaced by just using ZeroInstall, but sadly I have yet to convince everyone to do that ;)). A while ago I wrote 0env as a tool for making ZeroInstall dependencies available in your shell, but unlike most tools it encourages you to work in a subshell, rather than altering your current shell session. Some people don’t like this approach, but the benefits (in code simplicity and lack of bugs) were well worth it. Thankfully, you can have your cake and eat it too if you use direnv. For example, a normal use of 0env looks like:

$ 0env myproject.xml
[myproject] $ # I'm in a subshell
[myproject] $ exit
$ # back in my original shell

But for convenience, you can make a trivial .envrc that defers all the logic to 0env:

$ cat .envrc
direnv_load 0env myproject.xml -- direnv dump

Now, every time you cd into this project directory, direnv will set up whatever environment variables 0env would have set in the subshell, but it applies them to your current session instead, making sure to revert them when you leave the project directory.

Security concerns:

Obviously, care should be taken when automatically running scripts, since just cloning some code to your computer should not imply that you trust it to run arbitrary code. direnv is pretty respectable here: an .envrc will only be loaded once you’ve explicitly allowed it (by calling direnv allow in the directory). An allow action records the full path to the .envrc as well as a hash of its current contents - direnv will refuse to run every .envrc that doesn’t have a matching allow rule for both of these properties (i.e if it’s changed or has been moved / copied).

There are still potential attacks - e.g if I add ./tools to $PATH, then someone could create a pull request with add a malicious ls script in ./tools. If I check it out locally, neither the .envrc nor the location has changed, so direnv would run the .envrc, and then I’d be in trouble then next time I run ls (I do that a lot). This is pretty hard to avoid in the general case, I think the best approach is to keep the .envrc simple and as specific as possible, so that there is as most one place where bad things could happen, which you just have to be mindful of (e.g I’d be very cautious of any change which added new files under tools/ in the above example).

Development and contributing

I’m using direnv 2.2.1, which is barely a week old. It includes both of the features I contributed, which I (obviously ;)) think are important:

The author (zimbatm) seems friendly and receptive to patches, which makes contributing to direnv pretty painless. It’s written in go, which I’ve never used before. I’m definitely not a fan of the language’s insistence that error conditions must be implemented by wrapping almost every single function call in an if block (but which doesn’t even warn you if you completely ignore a function’s returned error value), but aside from that the direnv code is quite simple and easy to work with. And it’s certainly a huge step up from bash, which is what it used to be written in, and which many similar tools are written in.

Announcing the gup build tool

gup is a software build tool. It is designed to be general purpose, and does not care:

  • what kind of project you are building
  • what language you are building
  • what language you write your build scripts in

It has (almost) no syntax, instead it defines a simple protocol for where build scripts are located. Instead of declaring dependencies up-front, build scripts declare dependencies as they use them. This allows your dependencies to be enumerated at runtime, while building, rather than existing in some separate, statically-declared list which has to be manually updated if you wish your build to Not Be Wrong.

It’s similar to djb’s redo, which has been implemented by Avery Pennarun. In fact, I even took a bunch of code from redo. If you’ve used it before, gup will seem pretty familiar.

Please check out the project on github for more details and documentation. It’s still young, and it may change. But I’ve been using it for both work and person projects for a few months now, and it’s already proven much more solid than redo in my own usage.

Why didn’t I just help make redo better?

I tried, but I believe redo's design is impossible (or at least very difficult) to implement in a way that does not Do The Wrong Thing silently (and somewhat often). That is absolutely not a property I want from my build system.

The core problem springs from the fact that redo relies on local file state to determine whether a file is actually a target. The only difference between a build target and a source file is that a target is one which didn’t exist when you first tried to build it - i.e if something looks like a target but it already exists, then it is actually a source, and will never be built.

There is quite a bit of state locked up in the above definition, and it turns out that it’s perilously difficult to manage that state correctly. The end result in many cases is that redo thinks a built file is actually a source file, and it silently ignores all requests to build it1. Remedying this situation is manual - it cannot easily be scripted, and the actions required depend entirely on the state of the local workspace.

gup fixes this problem by requiring you to be more explicit about your targets. In gup, something is a target if (and only if) you’ve told gup how to build it. It also means that the set of targets is defined by the part of your project that’s tracked by source control, rather than the state of your local filesystem.

  1. When updating from Fedora 19 -> 20 recently, this happened to every single file redo had ever built. This may not be redo’s fault, but it shows how fragile the mechanism is.