Package Manager

yarn

PnP

Note

The following content is an extension based on this article

Background

The most direct reason for the Yarn team to develop the PnP feature is that the current dependency management method is too inefficient. It's slow both when referencing dependencies and when installing them.

Let's first discuss Node's logic when handling dependency references. This process has two scenarios:

If we pass a core module (such as "fs", "path", etc.) or a local relative path (such as ./module-a.js or /my-li/module-b.js) to the require() call, then Node will directly use the corresponding file. If it's not one of the cases described above, then Node will start looking for a directory named node_modules:

In the actual loop process, Node will first look for node_modules in the current directory. If it's not found, it will look in the parent directory, and so on until the system root directory. If a node_modules directory exists, it checks if the module to be loaded exists in the directory. If not, it continues searching in the parent directory. If the module to be loaded is found, it then checks if the corresponding packages.json specifies a main property. If a main property is specified, it loads the file pointed to by the main property; otherwise, it defaults to index.js. If there's no index.js file, it looks for index.json, then index.node. If none are found, it will throw an error.

The require module search flow chart is as follows:

For the specific require execution process, you can refer to this article. The execution chain can be divided into the following stages:

require => Module._load => Module.prototype._load => Module._extensions => Module._compile => return module.exports.

It can be seen that Node needs to perform a lot of processing when resolving dependencies, which is not efficient.

Let's look at what happens during dependency installation. Currently, the yarn install operation performs the following 4 steps:

Resolve the dependency version range to a specific version number
Download the corresponding version's tar package to the local offline mirror
Extract the dependency from the offline mirror to the local cache
Copy the dependency from the local cache to the node_modules directory in the current directory

The 4th step also involves a lot of file I/O, resulting in inefficient dependency installation (especially in CI environments where all dependencies need to be installed each time).

Facebook's engineers had enough of these issues and decided to find a solution that could completely solve the problems while remaining compatible with the existing ecosystem. This led to the Plug'n'Play feature, abbreviated as PnP. It has been tested internally at Facebook for some time, and now the Yarn team has decided to share it with the community and optimize it together. The most direct reason for the Yarn team to develop the PnP feature is that the current dependency management method is too inefficient. It's slow both when referencing dependencies and when installing them.

Implementation Method

Instead of copying dependencies from the local cache to node_modules, Yarn maintains a static mapping table that contains the following information:

Which versions of which dependency packages are included in the current dependency tree
How these dependency packages are related to each other
The specific locations of these dependency packages in the file system

This mapping table corresponds to the .pnp.js file in the project directory in Yarn's PnP implementation.

How is this .pnp.js file generated, and how does Yarn use it?

During dependency installation, after step 3 is completed, Yarn doesn't copy the dependency to the node_modules directory. Instead, it records the specific location of the dependency in the cache in .pnp.js. This avoids a lot of I/O operations while also preventing the generation of a node_modules directory in the project directory.

Additionally, .pnp.js contains a special resolver. Yarn uses this special resolver to handle require() requests (it intercepts at the Module level, changing the original node behavior). This resolver directly determines the specific location of the dependency in the file system based on the static mapping table contained in the .pnp.js file, thus avoiding the I/O operations in the current implementation when handling dependency references.

Advantages

From the PnP implementation, it can be seen that the same version of the same dependency referenced by different projects on the same system actually points to the same directory in the global cache. This brings several immediate benefits:

The speed of installing dependencies has been unprecedentedly improved. Multiple CI instances in a CI environment can share the same cache
Multiple projects in the same system no longer need to occupy multiple disk spaces

Disadvantages

Script execution is restricted. All dependency references must be handled by the resolver in .pnp.js. Therefore, whether executing a script or directly executing a JS file with node, it must be processed by Yarn. It must be executed through yarn run or yarn node.
Debugging is inconvenient. In PnP projects, there is no node_modules directory. Compared to directly executing scripts with node, PnP rewrites the Module implementation and adds a mapping operation. When debugging source code, it must also go through the PnP layer, but developers don't pay much attention to PnP's internal implementation. Furthermore, since dependencies point to the global cache, we can no longer directly modify these dependencies. Developers cannot access the source code location in the original node_module, which is extremely inconvenient for debugging. To debug, you need to use yarn unplug packageName provided by yarn to copy a specific dependency to the .pnp/unplugged directory in the project. After that, the resolver in .pnp.js will automatically load this unplugged version. After debugging, execute yarn unplug --clear packageName to remove the corresponding dependency from the local .pnp/unplugged.
Issues:
1. Developers need to set breakpoints at the dependency package entry (.pnp/unplugged/npm-[module name]-[version]-[hash]-integrity/node_modules/[module name]/[entry path]) to debug.
2. For example, if A depends on B, and B depends on C, where B is a dependency module and C is an external dependency module of B. When debugging, you need to first yarn unplug B. If you need to debug the C module in the B module, you also need to yarn unplug C. The same applies to dependencies in the A module, greatly increasing debugging costs.

pnpm

Has the following excellent features:

Fast package installation speed. Based on this article, it's clear that in most scenarios, pnpm package installation speed is significantly better than npm/yarn, being 2-3 times faster, including when yarn uses the PnP installation mode.
Efficient disk space utilization

Contributors

XiSenao

SenaoXi

Changelog

Last edited 2 months ago

View full history

Package Manager ​

yarn ​

PnP ​

Background ​

Implementation Method ​

Advantages ​

Disadvantages ​

pnpm ​