Module resolution with npm / nodejs
NodeJS’ require()
method is special. npm
is special. Some of that is good - its efforts to dissuade people from installing anything globally are commendable, for a start. But some of it is bad. It’s probably better to be aware of the bad parts than to learn them when they bite you.
Let’s run through a quick example of what happens when I install a package. For example, installing the bower
package will:
- install bower’s code under
node_modules/bower
- under
node_modules/bower
, install each of bower’s direct dependencies.
Of course, this is recursive - for each of bower’s direct dependencies, it also installs all of its dependencies. But it does so individually, so you end up with paths like (this is a real example):
node_modules/
bower/
node_modules/
update-notifier/
node_modules/
configstore/
node_modules/
yamljs/
node_modules/
argparse/
node_modules/
underscore
Unlike pretty much every package manager I’ve encountered, npm
makes no attempt to get just one copy of a given library. After installing bower
, NPM has unpacked the graceful-fs
package into 4 different locations under bower
. I’ve also installed the karma
test runner recently, which indirectly carries with it another 10 copies of graceful-fs
. My filesystem must be exceedingly graceful by now.
Relative ways
Having 14 different copies of graceful-fs
after installing only two packages should make you wonder: how on earth does it figure out which one to load? The trick to this is that the require
function is not really a global - it’s crafted specially for each module. For a given module executing at /path/to/module/index.js, require('foo')
will try to find the foo
module in the following locations:
/path/to/module/node_modules/foo
/path/to/node_modules/foo
/path/node_modules/foo
/node_modules/foo
(it later tries other locations, but we’ll get to those shortly)
The actual attempts are far more numerous than this - observing one instance, node ended up looking for 13 different filenames in each of the above folders (foo.js
, foo.coffee
, foo/index.js
, foo/index.coffee
, etc …). So that multiplies the number of files node tries to find by an order of magnitude, but it’s not really important to our understanding of the algorithm.
So this is how node
gives each module its own local version of graceful-fs
- it always resolves to the “closest” version it can find inside a node_modules folder. Presumably, this means it could load 10 identical instances of the module in a single process. Or indeed 9 identical versions plus one older version with an unpatched security vulnerability. Or two versions which will corrupt data when code using one version of a library talks to code that’s unknowingly using another version. It’s not a likely scenario, but it’s worth noting that npm
will allow it without so much as a warning.
“Global” folders
After trying what seems like every conceivable path based on the current file’s location up to and including the root of your filesystem, the second part of the search algorithm kicks in - node searches what it refers to as the “global folders”. A better word for this might be “absolute”, since they are influenced by local (individual process-level) environment variables - there is nothing necessarily global about them.
This type of module resolution is a more common idea, and is found in most dynamic loader systems. You can modify it by setting the $NODE_PATH environment variable, and it’s exactly what you would expect if you’re familiar with $LD_LIBRARY_PATH
, $PYTHONPATH
, $RUBYLIB
, etc. For each path in this colon-separated list, node will look for the module foo
inside the given path. Node (like others) has some additional builtin paths that are searched even though they don’t appear in $NODE_PATH
.
When npm
breaks itself
One interesting problem I’ve recently encountered (on the first module I try to create, what luck I have) is that npm
does not exploit the “global” search-path based loading, but relies on the relative lookup behaviour instead. This can get extremely confusing in the case of plugin systems:
- I write a plugin,
foo-adapter
. It’s for the fabulous node toolfoo
. - I tell
foo
about my plugin by settingplugins=['foo-adapter']
in foo.conf.js
The obvious way for foo
to load my plugin is by calling require("foo-adapter")
. This works. Sometimes. In the case where both modules have been installed by npm install
, my directory structure will look like:
node_modules/foo/index.js
node_modules/foo-adapter/index.js
When foo/index.js
does require("foo-adapter")
, it will look in:
node_modules/foo/node_modules/foo-adapter
node_modules/foo-adapter
… and find it.
However, there’s a catch. I’m currently working on adding a new feature to foo
. I don’t want to install it from npmjs.org - it doesn’t have the changes I need. I could npm install
it from its on-disk location, but that would copy it into my node_modules/
folder. And then each time I modified something (it’s still in development), I’d have to remember to install it again, or else I’d be using stale code and wondering why nothing had changed.
npm has a solution for this - npm link! Instead of copying the development version, it makes a symbolic link, ensuring that I’m always referencing the current version, not a copy. Now, npm link
is a bit weird. Despite npm
’s (admirable) distaste for installing anything globally, npm link
requires you to install your development version globally before you can use it. I have found no rationale for this decision.
The trouble is, because npm
relies on relative path importing, it has now become impossible for the foo
module to require("foo-adapter")
. Why is that?
My node_modules folder has:
node_modules/foo/index.js
node_modules/foo-adapter/index.js
But note that we used npm link
to reference foo
, so the path to foo
is actually something like:
node_modules/foo
->/lib/nodejs/lib/node_modules/foo
->/home/tim/dev/node/foo
(the arrow indicates a symbolic link)
So when require("foo-adapter")
tries to resolve the module, it uses the real path to itself. That is, it searches:
/home/tim/dev/node/foo/node_modules/foo-adapter
/home/tim/dev/node/node_modules/foo-adapter
/home/tim/dev/node_modules/foo-adapter
/home/tim/node_modules/foo-adapter
/home/node_modules/foo-adapter
/node_modules/foo-adapter
There’s no longer any connection to where I was running the module from, so it can never find my module, despite me using npm
in the way it was intended. In case you were wondering, this is the intended behaviour and will not be changed.