GFX::Monk Home

- page 4

Escaping an array of command-line arguments in C#

Let’s say you have an array of strings:

args = [ "arg1", "an argument with whitespace", 'even some "quotes"']

..and you want to pass them to a command, exactly as is. You don’t want it split on spaces, you don’t want quotes to disappear. You just want to pass exactly these strings to the command you’re running. In python, you would do something like:

subprocess.check_call(["echo"] + args)

In low-level C, it’s more effort, but it’s not really harder - you just use the execv* family of system calls, which takes an array of strings. At least on a UNIX-like OS.

But what if you’re using C# on Windows? Then it’s going to cost you a veritable screenful of code if you want to not screw it up. And you’ll probably screw it up. The internet has plenty of examples that happen to work well enough for simple data. But then they break when you add spaces, or double quotes, or backslashes, or multiple backslashes followed by a double quote. You don’t want that code. You want this code.

I’m honestly floored that nobody has published this code anywhere before (that I could find). So with the firm belief that it’s insane for anybody to have to implement this ridiculous escaping scheme for themselves, here it is:

Doing stuff when files change

There’s a common pattern in development tools to help with rapid feedback: you run a long-lived process that does a certain task. At the same time, it watches the filesystem and restarts or re-runs that task whenever a file that you’re monitoring changes.

This is an extremely useful tool for rapid feedback (which is why we’ve integrated nodemon into our Conductance app server), but is not very flexible - most tools are integrated into a web framework or other environment, and can’t easily be used outside of it. There are a few generic tools to do this kind of thing - I personally use watchdog a lot, but it’s sucky in various ways:

  • Configuring it to watch the right file types is hard
  • Configuring it to ignore “junk” files is hard, leading to infinite feedback loops if you get it wrong
  • It sees about 6 events from a single “save file” action, and then insists on running my build script 6 times in a row
  • It takes a bash string, rather than a list of arguments - so you have to deal with double-escaping all your special characters

And yet for all of those issues, I haven’t found a better tool that does what I need.

My build workflow

Lately, I’ve been making heavy use of my relatively-new build system, gup. It’s a bit like make, but way better in my humble-and-totally-biased-opinion. But going to a terminal window and typing up, enter (or the wrist-saving alternative crtl-p, ctrl-m) to rebuild things is tedious. But there’s no way I’m going to implement yet another watch-the-filesystem-and-then-re-run-something gup-specific tool, at least not until the lazy alternatives have been exhausted.

Obviously, my workflow isn’t just up, enter. It’s (frequently):

  • save file in vim
  • go to terminal
  • press up, enter
  • go to browser
  • refresh

And you know what? Monitoring every file is kind of dumb for this workflow. I don’t have gremlins running around changing files in my working tree at random (I hope), I almost always want to reload in response to me changing a file (with vim, of course). So why not just cooperate?

The simple fix

So I’ve written a stupid-dumb vim plugin, and a stupid-dumb python script. The vim plugin touches a file in $XDG_USER_DIR whenever vim saves a file. And then the script monitors just this file, and does whatever you told it to do each time the file is modified. The script automatically communicates with vim to enable / disable the plugin as needed, so it has no overhead when you’re not using it.

It’s called vim-watch, and I just gave you the link.

Addendum: restarting servers

While writing this post, I was a little disappointed that it still doesn’t quite replace tools that automatically restart a server when something changes, because it expects to run a build-style command that exits when it’s done - but servers run forever. Some unix daemons (like apache) restart themselves when you send them a HUP signal, but that’s not so common in app servers. So now huppy exists, too.

It’s a tiny utility that’ll run whatever long-lived process you tell it to, and when it receives a HUP signal it’ll kill that process (with SIGINT) if it’s still running, then restart it. It seems surprising that this didn’t exist before (maybe my google-fu is failing me) but on the other hand it’s less than 60 lines of code - hardly an expensive wheel to reinvent.

You can use it like:

$ # step 1: start your server
$ huppy run-my-server

$ # step 2: use vim-watch to reload the server on changes
$ vim-watch killall -HUP huppy

$ # Or, if you need to rebuild stuff before restarting the server,
$ vim-watch bash -c 'gup && killall -HUP huppy'

Oni Conductance

This past week, we (Oni Labs) announced Conductance, the next-generation web app server built on the StratifiedJS language (which we also built, and which has seen a number of steadily improving public releases over the past couple of years).

For a long time, I’ve been convinced that plan JavaScript is simply inappropriate for building large scale, reliable applications. That doesn’t mean it’s impossible, but the effort required to correctly write a heavily-asynchronous application in javascript involves a frankly exhausting amount of careful error checking and orchestration, and there are whole classes of confusing bugs you can get into with callback-based code which should not be possible in a well-structured language.

So I was extremely happy to join the Oni Labs team to help work on StratifiedJS, because it’s a much more extensive (and impressive) attempt to solve the same problems with asynchronous JavaScript that I was already trying to solve.

Conductance is a logical progression of this work: now that we have StratifiedJS, we’ve used its features to build a new kind of app server: one which maintains all the performance benefits of asynchronous JavaScript (it’s built on nodejs, after all), but which makes full use of the structured concurrency provided by StratifiedJS for both server and client-side code. And not just for nice, modular code with straightforward error handling - but actually new functionality, which would be impossible or extremely ungainly to achieve with normal JavaScript.

If you’re interested in building web apps (whether you already do, or would like to start), please do check out conductance.io for more details, and plenty of resources to help you get started building Conductance applications.

direnv: Convenient project-specific environments

I’m pretty particular about my development tools, and I really dislike any tool that requires careful curation of global state - e.g. ruby gems, python packages, etc. In recent years, things have gotten better. Ruby has bundler, which keeps a project’s dependencies locally (avoiding any global state). Similarly, python has virtualenv, which does much the same thing. Tools like rvm and nvm allow you to manage multiple versions of the language itself. Notably, the npm package manager for nodejs fully embraces local dependencies - by default, packages are always installed locally (although the implementation itself is not particularly sane).

The inconvenience with most of these is that they require the developer to do something to “get into” a certain environment - if you try to run your python project’s tests without first activating the correct virtualenv, things will fail pretty badly. Some tools (e.g rvm) include shell hooks to automatically activate the appropriate environment when you change directories, but they are tool-specific - you’ll need to add a hook in your shell for each tool you use, and I have my doubts that they would cooperate well since they do awful things like overriding the cd command.

Enter direnv

I was very excited to find out about direnv (github: zimbatm/direnv) a few weeks ago, because I had just been looking for exactly such a tool, and considering writing one myself (I’m rather glad I didn’t have to). The idea is simple: extract all the messy stuff that rvm, virtualenv, etc do to manage per-directory environment variables, and put it into a single, general-purpose tool. You place an .envrc script in the root directory of your project, and you can use whatever tools you need to inside that script to set project-specific environment variables (via export statements, or by delegating to bundler, virtualenv, etc). direnv takes care of sandboxing these modifications so that all changes are reversed when you leave the project directory.

Aside from relieving other tools of the arduous work of reimplementing this particular wheel (including individual integration with each shell), direnv is much more extensible than existing language-specific tools - you can (for example) also export credentials like AWS_ACCESS_KEY, or add project-specific scripts to your $PATH so you can just run mk, rather than having to invoke an explicit path like ./tools/mk.

Of course, few tools get my blessing these days if they don’t play well with ZeroInstall (if I had my way, all of rvm/virtualenv/npm/pip would be replaced by just using ZeroInstall, but sadly I have yet to convince everyone to do that ;)). A while ago I wrote 0env as a tool for making ZeroInstall dependencies available in your shell, but unlike most tools it encourages you to work in a subshell, rather than altering your current shell session. Some people don’t like this approach, but the benefits (in code simplicity and lack of bugs) were well worth it. Thankfully, you can have your cake and eat it too if you use direnv. For example, a normal use of 0env looks like:

$ 0env myproject.xml
[myproject] $ # I'm in a subshell
[myproject] $ exit
$ # back in my original shell

But for convenience, you can make a trivial .envrc that defers all the logic to 0env:

$ cat .envrc
direnv_load 0env myproject.xml -- direnv dump

Now, every time you cd into this project directory, direnv will set up whatever environment variables 0env would have set in the subshell, but it applies them to your current session instead, making sure to revert them when you leave the project directory.

Security concerns:

Obviously, care should be taken when automatically running scripts, since just cloning some code to your computer should not imply that you trust it to run arbitrary code. direnv is pretty respectable here: an .envrc will only be loaded once you’ve explicitly allowed it (by calling direnv allow in the directory). An allow action records the full path to the .envrc as well as a hash of its current contents - direnv will refuse to run every .envrc that doesn’t have a matching allow rule for both of these properties (i.e if it’s changed or has been moved / copied).

There are still potential attacks - e.g if I add ./tools to $PATH, then someone could create a pull request with add a malicious ls script in ./tools. If I check it out locally, neither the .envrc nor the location has changed, so direnv would run the .envrc, and then I’d be in trouble then next time I run ls (I do that a lot). This is pretty hard to avoid in the general case, I think the best approach is to keep the .envrc simple and as specific as possible, so that there is as most one place where bad things could happen, which you just have to be mindful of (e.g I’d be very cautious of any change which added new files under tools/ in the above example).

Development and contributing

I’m using direnv 2.2.1, which is barely a week old. It includes both of the features I contributed, which I (obviously ;)) think are important:

The author (zimbatm) seems friendly and receptive to patches, which makes contributing to direnv pretty painless. It’s written in go, which I’ve never used before. I’m definitely not a fan of the language’s insistence that error conditions must be implemented by wrapping almost every single function call in an if block (but which doesn’t even warn you if you completely ignore a function’s returned error value), but aside from that the direnv code is quite simple and easy to work with. And it’s certainly a huge step up from bash, which is what it used to be written in, and which many similar tools are written in.

Announcing the gup build tool

gup is a software build tool. It is designed to be general purpose, and does not care:

  • what kind of project you are building
  • what language you are building
  • what language you write your build scripts in

It has (almost) no syntax, instead it defines a simple protocol for where build scripts are located. Instead of declaring dependencies up-front, build scripts declare dependencies as they use them. This allows your dependencies to be enumerated at runtime, while building, rather than existing in some separate, statically-declared list which has to be manually updated if you wish your build to Not Be Wrong.

It’s similar to djb’s redo, which has been implemented by Avery Pennarun. In fact, I even took a bunch of code from redo. If you’ve used it before, gup will seem pretty familiar.

Please check out the project on github for more details and documentation. It’s still young, and it may change. But I’ve been using it for both work and person projects for a few months now, and it’s already proven much more solid than redo in my own usage.

Why didn’t I just help make redo better?

I tried, but I believe redo's design is impossible (or at least very difficult) to implement in a way that does not Do The Wrong Thing silently (and somewhat often). That is absolutely not a property I want from my build system.

The core problem springs from the fact that redo relies on local file state to determine whether a file is actually a target. The only difference between a build target and a source file is that a target is one which didn’t exist when you first tried to build it - i.e if something looks like a target but it already exists, then it is actually a source, and will never be built.

There is quite a bit of state locked up in the above definition, and it turns out that it’s perilously difficult to manage that state correctly. The end result in many cases is that redo thinks a built file is actually a source file, and it silently ignores all requests to build it1. Remedying this situation is manual - it cannot easily be scripted, and the actions required depend entirely on the state of the local workspace.

gup fixes this problem by requiring you to be more explicit about your targets. In gup, something is a target if (and only if) you’ve told gup how to build it. It also means that the set of targets is defined by the part of your project that’s tracked by source control, rather than the state of your local filesystem.

  1. When updating from Fedora 19 -> 20 recently, this happened to every single file redo had ever built. This may not be redo’s fault, but it shows how fragile the mechanism is. 

Passing arrays as arguments in bash

I tend to avoid bash wherever possible for scripting, because it has dangerously bad defaults and will happily munge your data unless you take great care to wrap it up properly (particularly whitespace). But sometimes you have no choice, so you might as well know how to do it safely.

Here’s how to capture argv as a bash array, and pass it on to another command without breaking if some argument contains a space:

args=("$@")
echo "${args[@]}"

You can also just pass “$@” directly, but the above syntax works for any array.

Don’t forget any of those quotes, or bash will silently ruin everything (until you have data with spaces, at which point it might loudy ruin everything).

Here’s how to convert a line-delimited string (e.g a list of files in the current directory) into an array and pass that on:

mapfile -t arr <<<"$(ls -1)"
echo "${arr[@]}"

Note that a sensible-looking:

ls -1 | mapfile -t args

will not work, as a builtin on the receiving end of a pipe gets run in a subshell.

If you don’t have mapfile (added in bash v4), you’ll have to resort to:

oldIFS="$IFS"; IFS=$'\n' read -d '' -r -a arr <<< "$(ls -1)"; IFS="$oldIFS"; unset oldIFS
echo "${arr[@]}";

I look forward to the day when I don’t have to know that.