Module resolution with npm / nodejs
require() method is special.
npm is special. Some of that is good - its efforts to dissuade people from installing anything globally are commendable, for a start. But some of it is bad. It’s probably better to be aware of the bad parts than to learn them when they bite you.
Let’s run through a quick example of what happens when I install a package. For example, installing the
bower package will:
- install bower’s code under
node_modules/bower, install each of bower’s direct dependencies.
Of course, this is recursive - for each of bower’s direct dependencies, it also installs all of its dependencies. But it does so individually, so you end up with paths like (this is a real example):
node_modules/ bower/ node_modules/ update-notifier/ node_modules/ configstore/ node_modules/ yamljs/ node_modules/ argparse/ node_modules/ underscore
Unlike pretty much every package manager I’ve encountered,
npm makes no attempt to get just one copy of a given library. After installing
bower, NPM has unpacked the
graceful-fs package into 4 different locations under
bower. I’ve also installed the
karma test runner recently, which indirectly carries with it another 10 copies of
graceful-fs. My filesystem must be exceedingly graceful by now.
Having 14 different copies of
graceful-fs after installing only two packages should make you wonder: how on earth does it figure out which one to load? The trick to this is that the
require function is not really a global - it’s crafted specially for each module. For a given module executing at /path/to/module/index.js,
require('foo') will try to find the
foo module in the following locations:
(it later tries other locations, but we’ll get to those shortly)
The actual attempts are far more numerous than this - observing one instance, node ended up looking for 13 different filenames in each of the above folders (
foo/index.coffee, etc …). So that multiplies the number of files node tries to find by an order of magnitude, but it’s not really important to our understanding of the algorithm.
So this is how
node gives each module its own local version of
graceful-fs - it always resolves to the “closest” version it can find inside a node_modules folder. Presumably, this means it could load 10 identical instances of the module in a single process. Or indeed 9 identical versions plus one older version with an unpatched security vulnerability. Or two versions which will corrupt data when code using one version of a library talks to code that’s unknowingly using another version. It’s not a likely scenario, but it’s worth noting that
npm will allow it without so much as a warning.
After trying what seems like every conceivable path based on the current file’s location up to and including the root of your filesystem, the second part of the search algorithm kicks in - node searches what it refers to as the “global folders”. A better word for this might be “absolute”, since they are influenced by local (individual process-level) environment variables - there is nothing necessarily global about them.
This type of module resolution is a more common idea, and is found in most dynamic loader systems. You can modify it by setting the $NODE_PATH environment variable, and it’s exactly what you would expect if you’re familiar with
$RUBYLIB, etc. For each path in this colon-separated list, node will look for the module
foo inside the given path. Node (like others) has some additional builtin paths that are searched even though they don’t appear in
npm breaks itself
One interesting problem I’ve recently encountered (on the first module I try to create, what luck I have) is that
npm does not exploit the “global” search-path based loading, but relies on the relative lookup behaviour instead. This can get extremely confusing in the case of plugin systems:
- I write a plugin,
foo-adapter. It’s for the fabulous node tool
- I tell
fooabout my plugin by setting
The obvious way for
foo to load my plugin is by calling
require("foo-adapter"). This works. Sometimes. In the case where both modules have been installed by
npm install, my directory structure will look like:
require("foo-adapter"), it will look in:
… and find it.
However, there’s a catch. I’m currently working on adding a new feature to
foo. I don’t want to install it from npmjs.org - it doesn’t have the changes I need. I could
npm install it from its on-disk location, but that would copy it into my
node_modules/ folder. And then each time I modified something (it’s still in development), I’d have to remember to install it again, or else I’d be using stale code and wondering why nothing had changed.
npm has a solution for this - npm link! Instead of copying the development version, it makes a symbolic link, ensuring that I’m always referencing the current version, not a copy. Now,
npm link is a bit weird. Despite
npm’s (admirable) distaste for installing anything globally,
npm link requires you to install your development version globally before you can use it. I have found no rationale for this decision.
The trouble is, because
npm relies on relative path importing, it has now become impossible for the
foo module to
require("foo-adapter"). Why is that?
My node_modules folder has:
But note that we used
npm link to reference
foo, so the path to
foo is actually something like:
(the arrow indicates a symbolic link)
require("foo-adapter") tries to resolve the module, it uses the real path to itself. That is, it searches:
There’s no longer any connection to where I was running the module from, so it can never find my module, despite me using
npm in the way it was intended. In case you were wondering, this is the intended behaviour and will not be changed.