I’ve long considered fluency in Nix to be a superpower that pays off way more than you might imagine, if you haven’t experienced it yourself.

Sure, it helps with the obvious practical things you’d expect - my system setup is declarative, reproducible, and suffers from vanishingly few chaotic state-based issues that tend to plague less reproducible systems (like brew in particular).

There are plenty of downsides too - I run into nix-specific issues that my colleagues with more normal setups don’t suffer. Interactions with tools like bundler building native extensions can be frustrating at best.

Those are the unsurprising, surface level tradeoffs when using a good-but-novel package manager. The real superpower comes through the staggering amount of things which are not just possible, but downright straightforward due to the reliable, principled way that nix works. Here’s a good example from last week:

Do niche things, encounter niche bugs

Firstly, I’ve got this rust project which I build on MacOS, but which cross-compiles to arm & x86 for both Mac & Linux. Without nix, I think I got a cross-compiling toolchain working once in my life, and that was mostly thanks to colleagues who had sorted it all out before I got there.

So cross-compilation thing is impossible thing number one that nix makes practical (I’ve talked about this before). But then when it breaks, nix also makes it easy to diagnose, fix and reproduce. When I updated nixpkgs in this project, cross-compilation stopped working with this error, which has nothing to do with my code:

cannot read symbolic link '/nix/store/fgkznmnz1swzp8ck75fa2zvj62pkjgvq-musl-x86_64-unknown-linux-musl-1.2.3/lib/ld-musl-x86_64.so.1': Permission denied

And indeed, I can see that it lacks permissions:

$ ls -l /nix/store/fgkznmnz1swzp8ck75fa2zvj62pkjgvq-musl-x86_64-unknown-linux-musl-1.2.3/lib
ls: cannot read symbolic link '/nix/store/fgkznmnz1swzp8ck75fa2zvj62pkjgvq-musl-x86_64-unknown-linux-musl-1.2.3/lib/ld-musl-x86_64.so.1': Permission denied
total 3796
-r--r--r-- 1 root wheel    1000 Jan  1  1970 crti.o
-r--r--r-- 1 root wheel     776 Jan  1  1970 crtn.o
lrwx------ 1 root wheel       7 Jan  1  1970 ld-musl-x86_64.so.1
-r--r--r-- 1 root wheel 3016118 Jan  1  1970 libc.a
lrwxr-xr-x 1 root wheel       7 Jan  1  1970 libc.musl-x86_64.so.1 -> libc.so*
-r-xr-xr-x 1 root wheel  811384 Jan  1  1970 libc.so*
# ...

Note that ls prints an error while listing this directory, which is unusual. Looks like only root will be able to read that file, though all the other files have read access for all users.

I don’t know much about musl, and I don’t know why it would create a symlink that I can’t read. But nix gives me an amazingly useful toolkit for diagnosing and fixing these kinds of issues.

First, because there’s a 1:1 mapping between outputs and derivations (recipes), I can ask nix to tell me which derivation made this path:

$ nix-store --query --deriver '/nix/store/fgkznmnz1swzp8ck75fa2zvj62pkjgvq-musl-x86_64-unknown-linux-musl-1.2.3/lib/ld-musl-x86_64.so.1'

That’s the recipe for building the outputs I’m looking at, in a canonical, serialized format. I can drop into a shell with that derivation’s environment, i.e. all dependencies and environment variables set up:

$ nix-shell /nix/store/wv79kvgc5sdjxjqjbfi4sjhzd8s8fa47-musl-x86_64-unknown-linux-musl-1.2.3.drv
$ echo $src

Often when a derivation fails to build it’s useful to drop into its shell and try various commands to figure out a fix interactively. I can also pretty-print the derivation in JSON, using nix derivation show.

But first, I want to find where this derivation actually is in nixpkgs.

For maximum reproducibility, I check out nixpkgs at the git commit my project is pinned to, from the release-23.11 branch. Using guesses as well as some helpfully-unique strings in the derivation’s postInstall phase, I locate pkgs/os-specific/linux/musl/default.nix.

Now that I’ve oriented myself, it’s time to make a minimal reproduction. I’m not doing anything funky with musl, so it should be easy to reproduce outside my project’s rather large nix expression:

$ cat musltest.nix
with import /Users/tcuthbertson/dev/nix/nixpkgs {
  crossSystem.config = "x86_64-unknown-linux-musl";
$ nix-build musltest.nix

OK, it printed the exact same path (and didn’t need to build anything). We’ve reproduced the problematic nix expression in 4 lines, without any noise related to the real project.

Now I’d like to see the build output. I can ask nix to check the build (i.e. build it again):

$ nix-build --check musltest.nix
# lots of build output

Here we get a hint. The build output includes the problematic filename in a list of file installations:

(I’ve replaced the long /nix/store path with $out in this output for brevity)

./tools/install.sh -D -m 644 lib/libdl.a $out/lib/libdl.a
./tools/install.sh -D -m 644 lib/musl-gcc.specs $out/lib/musl-gcc.specs
./tools/install.sh -D -l libc.so $out/lib/ld-musl-x86_64.so.1 || true
# ... many more

This tells me two things:

  • this file doesn’t have an explicit permission mode like the others (-m 644).
  • it’s using a custom ./install.sh to do the installation

Sometime during this process I remembered a relevant question:

No, they don’t:

The symbolic notation, lrwxrwxrwx, is the only set of access permissions a symbolic link can have. Additionally, these permissions are only representative as they are never used for any operation.

But… Huh?

Oh, Macs are different, symlinks can have permissions on a Mac. So now we’re firmly in niche territory, having Mac-only issues cross-building a linux-only program.

OK, time to look at this now-suspicious ./install.sh. It has some chmod code, could that be the problem? Unlikely, remember the problematic file didn’t specify any permissions. Why would it, symlinks don’t have permissions. But just a few lines above the chmod, a smoking gun:

umask 077

Gotcha. It’s hard to accidentally chmod a symlink, as you need to pass in the nonstandard -h flag to operate on the link instead of the thing it points to. But umask is used to restrict default permissions for any kind of file creation. Including, presumably, symlinks on MacOS.

There’s a couple of ways to fix this, the easiest it just to change this chmod line so that it allows read access. So my fix is to relax the chmod to 022; i.e. prevent write access but not read or execute.

Testing changes

Now that I think I have a fix, can I test it?

Of course. The steps I used are:

1. Make the change

I check out the musl git repository and make changes locally (in this case, modifying the umask line in install.sh)

2. Integrate it into nix

To integrate my changes I could alter the nix expression’s src to point to a fork of the source code. But for simple changes it’s usually easier to iterate by adding a patch. So first I generated a patch based on my changes in git (git diff > musl.patch).

Then I added that to the existing list of patches in the nix expression (here). For testing out changes, I just use an (absolute) file path to reference the patch file on disk. This will only work on my system, but it allows me to make changes and have them picked up immediately, without needing to update any digest / commit IDs.

3. Test it

With that patch added, I rebuilt my musltest.nix test case, and success the resulting symlink has read access. Looks like I’ve fixed it!

Integrating changes

Now that I’m confident the fix works, I want to integrate it into my actual project. After all, this all started with me wanting to build some software, not muck about in musl’s custom installer.

In order for the fix to be usable outside of my machine, I need to:

  • get the patch online. I push my changes to github and use that to serve a patch file
  • update the nix expression in my fork of nixpkgs to include the patch
  • pin my project to use my nixpkgs fork (there are plenty of ways to manage this, I like using niv)

And… that’s it!

Eliminating the Software Distribution Chasm

Of course, all of these things can (and are) routinely done by people without nix. You don’t need superpowers to fix bugs in software.

But nix dramatically widens the amount of things I feel empowered to fix, as well as giving me confidence that I will be able to reproduce or modify anything I need to. And not just on my machine - any changes I can make locally, I can also ship to my users, no matter how deep in the stack the fix is.

Mentally I think of this as a distribution chasm. Outside of nix, there’s often this huge gap between the workflow and capabilities of what you can do on your machine, vs the very different workflow and capabilities there are for distributing software to others.

For example: say you use Ubuntu, and you find that musl is broken in some obscure way. I’m sure there are plenty of debian tools I don’t know about which allow you to get a deb package, set up a build environment, make changes to the source code, and then rebuild a modified package against those sources.

But then once you have a fix, how do you use it? Do you send it to the Ubuntu or upstream maintainers, and wait for your users to get the update via official channels? That’s a lot of waiting, possibly years depending on when your users upgrade their OS. Maybe you can set up your own deb repository (ugh!) so you can at least run your modifications in CI, but it’s a horrible thing to ask of your users. Not to mention you’re modifying the global version of e.g. musl they have installed, which is a pretty invasive thing to be doing.

So the user-friendly option would be to eject from package management and leap over the distribution chasm - you have to distribute your musl fork yourself, perhaps by committing it into your project (vendoring), or writing a script to build it from source as part of your project’s build. But wow, now your project takes on all the complexity of building musl.

And for what? The fix was embarrassingly simple. It’s just a single line of bash. (Aside: there are workarounds for this particular bug; but those wouldn’t work if it were a change to some C code).

Anecdotally, I tried building musl outside of nix, but I immediately ran into unrelated compile errors. That’s not unusual when first compiling an unfamiliar project, and it’s easy to give up at this stage. But using nix? After a few hours I’d diagnosed, fixed, tested and submitted a fix upstream. And while I wait, I’ve integrated that patch into my software, so I can keep building. It works on my machine, in CI, and for other contributors, with no need for manual setup. Feels like a superpower to me.