GFX::Monk Home

Posts tagged: "programming" - page 4

direnv: Convenient project-specific environments

I’m pretty particular about my development tools, and I really dislike any tool that requires careful curation of global state - e.g. ruby gems, python packages, etc. In recent years, things have gotten better. Ruby has bundler, which keeps a project’s dependencies locally (avoiding any global state). Similarly, python has virtualenv, which does much the same thing. Tools like rvm and nvm allow you to manage multiple versions of the language itself. Notably, the npm package manager for nodejs fully embraces local dependencies - by default, packages are always installed locally (although the implementation itself is not particularly sane).

The inconvenience with most of these is that they require the developer to do something to “get into” a certain environment - if you try to run your python project’s tests without first activating the correct virtualenv, things will fail pretty badly. Some tools (e.g rvm) include shell hooks to automatically activate the appropriate environment when you change directories, but they are tool-specific - you’ll need to add a hook in your shell for each tool you use, and I have my doubts that they would cooperate well since they do awful things like overriding the cd command.

Enter direnv

I was very excited to find out about direnv (github: zimbatm/direnv) a few weeks ago, because I had just been looking for exactly such a tool, and considering writing one myself (I’m rather glad I didn’t have to). The idea is simple: extract all the messy stuff that rvm, virtualenv, etc do to manage per-directory environment variables, and put it into a single, general-purpose tool. You place an .envrc script in the root directory of your project, and you can use whatever tools you need to inside that script to set project-specific environment variables (via export statements, or by delegating to bundler, virtualenv, etc). direnv takes care of sandboxing these modifications so that all changes are reversed when you leave the project directory.

Aside from relieving other tools of the arduous work of reimplementing this particular wheel (including individual integration with each shell), direnv is much more extensible than existing language-specific tools - you can (for example) also export credentials like AWS_ACCESS_KEY, or add project-specific scripts to your $PATH so you can just run mk, rather than having to invoke an explicit path like ./tools/mk.

Of course, few tools get my blessing these days if they don’t play well with ZeroInstall (if I had my way, all of rvm/virtualenv/npm/pip would be replaced by just using ZeroInstall, but sadly I have yet to convince everyone to do that ;)). A while ago I wrote 0env as a tool for making ZeroInstall dependencies available in your shell, but unlike most tools it encourages you to work in a subshell, rather than altering your current shell session. Some people don’t like this approach, but the benefits (in code simplicity and lack of bugs) were well worth it. Thankfully, you can have your cake and eat it too if you use direnv. For example, a normal use of 0env looks like:

$ 0env myproject.xml
[myproject] $ # I'm in a subshell
[myproject] $ exit
$ # back in my original shell

But for convenience, you can make a trivial .envrc that defers all the logic to 0env:

$ cat .envrc
direnv_load 0env myproject.xml -- direnv dump

Now, every time you cd into this project directory, direnv will set up whatever environment variables 0env would have set in the subshell, but it applies them to your current session instead, making sure to revert them when you leave the project directory.

Security concerns:

Obviously, care should be taken when automatically running scripts, since just cloning some code to your computer should not imply that you trust it to run arbitrary code. direnv is pretty respectable here: an .envrc will only be loaded once you’ve explicitly allowed it (by calling direnv allow in the directory). An allow action records the full path to the .envrc as well as a hash of its current contents - direnv will refuse to run every .envrc that doesn’t have a matching allow rule for both of these properties (i.e if it’s changed or has been moved / copied).

There are still potential attacks - e.g if I add ./tools to $PATH, then someone could create a pull request with add a malicious ls script in ./tools. If I check it out locally, neither the .envrc nor the location has changed, so direnv would run the .envrc, and then I’d be in trouble then next time I run ls (I do that a lot). This is pretty hard to avoid in the general case, I think the best approach is to keep the .envrc simple and as specific as possible, so that there is as most one place where bad things could happen, which you just have to be mindful of (e.g I’d be very cautious of any change which added new files under tools/ in the above example).

Development and contributing

I’m using direnv 2.2.1, which is barely a week old. It includes both of the features I contributed, which I (obviously ;)) think are important:

The author (zimbatm) seems friendly and receptive to patches, which makes contributing to direnv pretty painless. It’s written in go, which I’ve never used before. I’m definitely not a fan of the language’s insistence that error conditions must be implemented by wrapping almost every single function call in an if block (but which doesn’t even warn you if you completely ignore a function’s returned error value), but aside from that the direnv code is quite simple and easy to work with. And it’s certainly a huge step up from bash, which is what it used to be written in, and which many similar tools are written in.

Announcing the gup build tool

gup is a software build tool. It is designed to be general purpose, and does not care:

  • what kind of project you are building
  • what language you are building
  • what language you write your build scripts in

It has (almost) no syntax, instead it defines a simple protocol for where build scripts are located. Instead of declaring dependencies up-front, build scripts declare dependencies as they use them. This allows your dependencies to be enumerated at runtime, while building, rather than existing in some separate, statically-declared list which has to be manually updated if you wish your build to Not Be Wrong.

It’s similar to djb’s redo, which has been implemented by Avery Pennarun. In fact, I even took a bunch of code from redo. If you’ve used it before, gup will seem pretty familiar.

Please check out the project on github for more details and documentation. It’s still young, and it may change. But I’ve been using it for both work and person projects for a few months now, and it’s already proven much more solid than redo in my own usage.

Why didn’t I just help make redo better?

I tried, but I believe redo's design is impossible (or at least very difficult) to implement in a way that does not Do The Wrong Thing silently (and somewhat often). That is absolutely not a property I want from my build system.

The core problem springs from the fact that redo relies on local file state to determine whether a file is actually a target. The only difference between a build target and a source file is that a target is one which didn’t exist when you first tried to build it - i.e if something looks like a target but it already exists, then it is actually a source, and will never be built.

There is quite a bit of state locked up in the above definition, and it turns out that it’s perilously difficult to manage that state correctly. The end result in many cases is that redo thinks a built file is actually a source file, and it silently ignores all requests to build it1. Remedying this situation is manual - it cannot easily be scripted, and the actions required depend entirely on the state of the local workspace.

gup fixes this problem by requiring you to be more explicit about your targets. In gup, something is a target if (and only if) you’ve told gup how to build it. It also means that the set of targets is defined by the part of your project that’s tracked by source control, rather than the state of your local filesystem.

  1. When updating from Fedora 19 -> 20 recently, this happened to every single file redo had ever built. This may not be redo’s fault, but it shows how fragile the mechanism is. 

Passing arrays as arguments in bash

I tend to avoid bash wherever possible for scripting, because it has dangerously bad defaults and will happily munge your data unless you take great care to wrap it up properly (particularly whitespace). But sometimes you have no choice, so you might as well know how to do it safely.

Here’s how to capture argv as a bash array, and pass it on to another command without breaking if some argument contains a space:

args=("$@")
echo "${args[@]}"

You can also just pass “$@” directly, but the above syntax works for any array.

Don’t forget any of those quotes, or bash will silently ruin everything (until you have data with spaces, at which point it might loudy ruin everything).

Here’s how to convert a line-delimited string (e.g a list of files in the current directory) into an array and pass that on:

mapfile -t arr <<<"$(ls -1)"
echo "${arr[@]}"

Note that a sensible-looking:

ls -1 | mapfile -t args

will not work, as a builtin on the receiving end of a pipe gets run in a subshell.

If you don’t have mapfile (added in bash v4), you’ll have to resort to:

oldIFS="$IFS"; IFS=$'\n' read -d '' -r -a arr <<< "$(ls -1)"; IFS="$oldIFS"; unset oldIFS
echo "${arr[@]}";

I look forward to the day when I don’t have to know that.

StratifiedJS 0.14 released

Today we (Oni Labs) released StratifiedJS 0.14. This is the first release since I started working here full-time, and it’s a big one: loads of useful new syntax, as well as a thoroughly kitted-out standard library.

StratifiedJS is a Javascript-like language that compiles to Javascript, but which supports advanced syntax and semantics, like:

  • blocking-style code for asynchronous operations (no callbacks!)
  • try/catch error handling works even for asynchronous code
  • a structured way of managing concurrent code (waitfor/or, waitfor/and, waitforAll, waitforWirst, etc).
  • ruby-style blocks
  • lambda expressions (arrow functions)
  • quasi-quote expressions

Check it out at onilabs.com/stratifiedjs.

Module resolution with npm / nodejs

NodeJS’ require() method is special. npm is special. Some of that is good - its efforts to dissuade people from installing anything globally are commendable, for a start. But some of it is bad. It’s probably better to be aware of the bad parts than to learn them when they bite you.

Let’s run through a quick example of what happens when I install a package. For example, installing the bower package will:

  • install bower’s code under node_modules/bower
  • under node_modules/bower, install each of bower’s direct dependencies.

Of course, this is recursive - for each of bower’s direct dependencies, it also installs all of its dependencies. But it does so individually, so you end up with paths like (this is a real example):

node_modules/
  bower/
    node_modules/
      update-notifier/
        node_modules/
          configstore/
            node_modules/
              yamljs/
                node_modules/
                  argparse/
                    node_modules/
                      underscore

Unlike pretty much every package manager I’ve encountered, npm makes no attempt to get just one copy of a given library. After installing bower, NPM has unpacked the graceful-fs package into 4 different locations under bower. I’ve also installed the karma test runner recently, which indirectly carries with it another 10 copies of graceful-fs. My filesystem must be exceedingly graceful by now.

0env: Using ZeroInstall feeds interactively

I’ve just released version 1.0 of 0env. Its purpose is similar to the “interactive” mode of operation of tools like rvm, virtualenv, etc. That is, “entering” some environment with an interactive shell. But instead of being part of some language-specific development tool, it works with any ZeroInstall feeds (which are language-agnostic and cross-platform). The readme on the linked page pretty much explains it all, but I’ll summarize the important features here:

  • You can try out one (or more) ZeroIntall feeds interactively
  • All work happens in a subshell, with a modified shell prompt to clarify what context you’re in
  • There is nothing to roll back, modify or undo - it’s completely stateless
  • It works for published feeds (URLs) as well as unpublished or development local feeds
  • It works cross-platform

I really feel this is an important tool for helping people adopt and use ZeroInstall feeds. ZeroInstall is a great way to publish software, but until now it has been awkward to try one or more feeds out interactively, partly because there is nothing to install.

(view link)