Content Addressable Storage of pnpm
pnpm (performant npm)
is a package manager for node.js
designed to improve installation speed and disk space efficiency, offering significant advantages over alternatives like npm
and yarn classic
. The core features of pnpm
include:
- Efficient installation speed through linking (
hardlink
orreflink
) from a global store. - Strict dependency access rules enforced through symbolic links and structured directory design.
- Built-in support for monorepo management.
- Deterministic dependency installation through the
pnpm-lock.yaml
file.
The cornerstone of pnpm
is its Content Addressable Storage mechanism. Unlike traditional package managers that copy dependency files in each project, pnpm
stores files in a global store (typically one per disk or filesystem) based on their content hash. This means that if multiple different packages or different versions of the same package contain identical files, only one physical copy of that file will be stored on disk. This is key to pnpm
's significant disk space savings.
Content Addressable Storage (CAS) Architecture
Content Addressing Principle
The core idea of Content Addressable Storage is: files are stored and retrieved using their content hash as a unique identifier. When pnpm
needs to store a file, it calculates the file's content hash. If a file with the same hash already exists in the global store ([[pnpm store path]]/files/
), pnpm
knows the file content hasn't changed and doesn't need to be stored again. Therefore, even if 100 projects depend on lodash
, or depend on different versions of lodash
, identical files will only have one physical copy in the CAS. This is in stark contrast to the approach of early npm
or yarn
classic versions, which created dependency copies in each project's node_modules
directory.
Advantages of CAS
- Disk Space Efficiency: CAS significantly reduces disk space usage, especially in scenarios with many projects or large dependencies, potentially saving several GB of space. More importantly, when updating dependencies,
pnpm
only needs to add files that have actually changed between versions to the store, rather than copying entire new versions of packages. - Installation Speed: Since only missing files need to be downloaded from the store, and files are primarily linked (rather than copied) to projects, network transmission and disk I/O operations are greatly reduced, significantly improving installation speed. Additionally,
pnpm
can process package installations in parallel, further enhancing efficiency.
Store Location (storeDir
)
The default location of the pnpm
store depends on the operating system and environment variable settings. The store path can be customized by setting the storeDir
option in the project root's .npmrc
file, workspace configuration file pnpm-workspace.yaml
, or global configuration file.
Default storeDir
Locations
OS | Environment Variable Condition | Default Path |
---|---|---|
Linux | $PNPM_HOME is set | $PNPM_HOME/store |
Linux | $XDG_DATA_HOME is set | $XDG_DATA_HOME/pnpm/store |
Linux | Other cases | ~/.local/share/pnpm/store |
macOS | $PNPM_HOME is set | $PNPM_HOME/store |
macOS | Other cases | ~/Library/pnpm/store |
Windows | $PNPM_HOME is set | $PNPM_HOME/store |
Windows | Other cases | ~/AppData/Local/pnpm/store |
You can view the current pnpm
global store path using the pnpm store path
command.
WARNING
A crucial prerequisite is that for pnpm
to use hard links or reflinks (copy-on-write links), the store must be on the same filesystem or drive as the project being installed. If storeDir
points to a different drive, pnpm
won't be able to create these efficient links and will fall back to file copying. While this ensures successful installation, it completely loses pnpm
's core advantages in disk space and installation speed. File system limitations prevent hard links (and usually reflinks) from crossing partition or drive boundaries. pnpm
intelligently detects this situation and prioritizes correctness (through copying) over efficiency when linking isn't possible.
Therefore, physical disk layout directly affects pnpm
's performance. If users don't explicitly set storeDir
and run pnpm install
on different drives, pnpm
will automatically create an independent store on each drive (e.g., creating D:\\.pnpm-store
at the root of drive D:).
Role of pnpm-lock.yaml
The pnpm-lock.yaml
file plays a crucial role. It precisely records the project's resolved dependency tree structure, including the exact version of each dependency and its own dependencies, ensuring that each installation generates an identical node_modules
structure, achieving deterministic installation. This file also stores the integrity
hash value (typically SHA-512) of each dependency package obtained from the package registry. This hash value is a unique fingerprint of the package's specific version content.
Key for Cache Lookup: It's the main basis for
pnpm
to look up whether the exact content package exists in the global content addressable store.pnpm
will try to locate the cached file identified by thisintegrity
.Verification Standard for Cache Miss (Previously Downloaded Packages): If the corresponding package isn't found in the cache (cache miss),
pnpm
needs to download it. After downloading, theintegrity
value inpnpm-lock.yaml
becomes the target hash value that must be achieved.pnpm
will calculate theintegrity
value of the downloaded package (tarball file) and compare it with theintegrity
value inpnpm-lock.yaml
. If they match, the downloaded package is considered complete and untampered.Cache Miss (Never Downloaded Packages): If the corresponding package isn't found in the cache (cache miss),
pnpm
needs to download it. After downloading,pnpm
will calculate theintegrity
value of the downloaded package (tarball file) and compare it with theintegrity
value recorded in the registry (defaulthttp://registry.npmjs.org/
). If they match, the downloaded package is considered complete and untampered, and it will be recorded inpnpm-lock.yaml
.
PS
- Tarball Integrity: During or after download,
pnpm
calculates the hash value of the downloaded tarball file and compares it with the expectedintegrity
hash value recorded inpnpm-lock.yaml
or registry metadata. If they don't match, it indicates the downloaded file may be corrupted or tampered with, andpnpm
will error and abort the installation. Notably, for security reasons, the npm ecosystem has moved from the early insecure SHA-1 hash to the more cryptographically strong SHA-512 hash. - Store Integrity (
verifyStoreIntegrity
):pnpm
provides a configuration optionverifyStoreIntegrity
, which defaults totrue
. When this option is enabled, ifpnpm
finds that a file in the CAS has been modified since its last write, it will verify the file's content before linking it to the project'snode_modules
. This provides an additional layer of protection against accidental store corruption, but comes with a slight performance overhead. - Lock File Integrity (Potential Issue): Although
pnpm-lock.yaml
ensures installation determinism, it itself may be accidentally corrupted or manually edited incorrectly during version control operations (like git merge conflicts). Currently, if the lock file is consistent with the dependency declarations inpackage.json
,pnpm
tends to trust the lock file's content for faster installation speed. However, the community discussion has proposed adding a lock file checksum (like alockfileChecksum
field) to quickly detect if the lock file has been tampered with without performing a full dependency resolution.
This design reflects the trade-off between installation speed and robustness. pnpm
defaults to prioritizing speed by trusting the latest lock file to skip time-consuming dependency resolution and verification steps. However, lock file merge conflicts or manual edits may lead to silent errors. The verifyStoreIntegrity
setting and proposed lock file checksum mechanism are both attempts to add safety nets without overly sacrificing performance, reflecting the inherent design challenges in optimizing package managers.
File Hashing, Indexing, and CAS Storage
Content Hash of Individual Files
When a package's tarball is successfully downloaded and verified, pnpm
processes its contents. A key step is that pnpm
calculates the content hash value (e.g., SHA-512) for each individual file in the package.
Here we need to clearly distinguish between two types of hash values: package integrity hash and file content hash.
Package Integrity Hash (recorded in
pnpm-lock.yaml
) is used to uniquely identify and verify a specific version of the entire package (i.e., the tarball file itself).Below is the relevant content from the
pnpm-lock.yaml
file for thecompressible
package:yamllockfileVersion: '9.0' settings: autoInstallPeers: true excludeLinksFromLockfile: false importers: .: dependencies: compressible: specifier: 2.0.18 version: 2.0.18 packages: compressible@2.0.18: resolution: { integrity: sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg== } engines: { node: '>= 0.6' } mime-db@1.54.0: resolution: { integrity: sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ== } engines: { node: '>= 0.6' } snapshots: compressible@2.0.18: dependencies: mime-db: 1.54.0 mime-db@1.54.0: {}
File Content Hash is used to identify the specific content of each independent file in the package, serving as the foundation for CAS storage and file deduplication.
pnpm
stores each file in the package according to its content hash value (storeDir/v10/files/
).Below is the index file for the
compressible
package inpnpm
's global store, which contains theintegrity
values of all files in the package, calculated bypnpm
when downloading the package.json{ "name": "compressible", "version": "2.0.18", "requiresBuild": false, "files": { "LICENSE": { "checkedAt": 1745925782020, "integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==", "mode": 420, "size": 1233 }, "index.js": { "checkedAt": 1745925782021, "integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==", "mode": 420, "size": 1038 }, "package.json": { "checkedAt": 1745925782021, "integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==", "mode": 420, "size": 1311 }, "HISTORY.md": { "checkedAt": 1745925782022, "integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==", "mode": 420, "size": 1976 }, "README.md": { "checkedAt": 1745925782022, "integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==", "mode": 420, "size": 1797 } } }
Package File Storage in CAS (v10/files/
)
The pnpm
store has a specific directory structure for storing these content-hash-based package file collections. In pnpm
version v10
, this structure is located at storeDir/v10/files/
.
Taking compressible@2.0.18
as an example, the package's file collection includes:
compressible@2.0.18
├── HISTORY.md
├── index.js
├── LICENSE
├── package.json
└── README.md
Taking index.js
as an example, the path to retrieve the index file for compressible@2.0.18
stored in pnpm
's global store is:
00/5debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68-compressible@2.0.18.json
Opening this index file, we can see:
{
"name": "compressible",
"version": "2.0.18",
"requiresBuild": false,
"files": {
"LICENSE": {
"checkedAt": 1745925782020,
"integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==",
"mode": 420,
"size": 1233
},
"index.js": {
"checkedAt": 1745925782021,
"integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==",
"mode": 420,
"size": 1038
},
"package.json": {
"checkedAt": 1745925782021,
"integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==",
"mode": 420,
"size": 1311
},
"HISTORY.md": {
"checkedAt": 1745925782022,
"integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==",
"mode": 420,
"size": 1976
},
"README.md": {
"checkedAt": 1745925782022,
"integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==",
"mode": 420,
"size": 1797
}
}
}
The integrity
hash value of the index.js
file is:
sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==
pnpm
stores the file according to this integrity
value in the following way:
The content hash value is typically converted to a hexadecimal string.
tsfunction base64ToHexNode(base64: string): string { return Buffer.from(base64, 'base64').toString('hex'); } const hex = base64ToHexNode( 'sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg=='.slice( 7 ) ); console.assert( hex === 'd7f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276' );
The first two characters of the obtained hexadecimal hash value are used as the name of a subdirectory.
The obtained
hex
value is:bashd7f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276
The first two characters are
d7
, so the file will be stored in thestoreDir/v10/files/d7/
directory.The remaining part of the hash value forms the filename in the subdirectory.
In this example, the remaining part is:
bashf33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276
By observing other stored file paths:
00/0a57539c3ea49736380ab214874f4528b455742d1f21c9d8027015cfae1372aa80382fefb88fd0abdd89c27c2d1332561e5186e4fa9ec8c348fc5768ca5b22
00/0cdacf5df3cc29401b02c11f6b4c473b193dfff269021e66d0749bdbe81b1c2336a5a72d1384a0d721b72064857a031a4c937653ed07c5e38731682c4c3eb8
1f/1a6454d1d08337c03084eb136fcd6f1ca1448a7f10cf150bd6dbb65aef9fa3be5b182f02d9127dc0ceadb3ad9a3a0911a98a7b63538afed05b8c5581e357db
We can see that the stored file's filename is truncated to 124 characters of the hash
value.
Therefore, the compressible@2.0.18/index.js
file will be stored at:
storeDir/v10/files/d7/f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276
The v10
in the path indicates this is the tenth major version of pnpm
's storage layout. This means pnpm
's internal structure may be optimized and evolved with version iterations, although the core concept of CAS
remains unchanged. This is important to understand, especially for users who try to interact directly with the store (which is generally not recommended), as the specific internal layout is an implementation detail that may change.
CAS Metadata File Storage in CAS (v10/index/
)
In addition to storing individual files, pnpm
also needs to store package metadata, which includes the mapping between package name, version, filename, and file content hash value. The storage method (CAS
) is similar to v10/files/
, but the directory for storing metadata files is separate from the directory for storing package files, namely v10/index/
.
Taking the package metadata file content of compressible@2.0.18
as an example:
{
"name": "compressible",
"version": "2.0.18",
"requiresBuild": false,
"files": {
"LICENSE": {
"checkedAt": 1745925782020,
"integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==",
"mode": 420,
"size": 1233
},
"index.js": {
"checkedAt": 1745925782021,
"integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==",
"mode": 420,
"size": 1038
},
"package.json": {
"checkedAt": 1745925782021,
"integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==",
"mode": 420,
"size": 1311
},
"HISTORY.md": {
"checkedAt": 1745925782022,
"integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==",
"mode": 420,
"size": 1976
},
"README.md": {
"checkedAt": 1745925782022,
"integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==",
"mode": 420,
"size": 1797
}
}
}
Content: Each metadata file serves as a manifest for a specific version of the package, containing package metadata (such as
name
andversion
), and most importantly, the mapping between all original filenames in the package and their corresponding content hash values. Through this mapping,pnpm
knows how to find all files that make up the package from thev10/files/
directory and organize them together with the correct directory structure.Naming/Location:
The naming rule for package metadata files is
[[integrity_hash]]-[[package]]@[[version]].json
.Taking
compressible@2.0.18.json
as an example, the integrity hash of thecompressible
package obtained frompnpm-lock.yaml
is:bashsha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg==
Convert this integrity hash to a hexadecimal string using the
base64ToHexNode
function.tsconst hex = base64ToHexNode( 'sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg=='.slice( 7 ) ); console.assert( hex === '005debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68cfb51a460d36699743dbd5708ee89783081769d76e8282cf6c331a928e063246' );
Similarly, use the first two characters
00
as the subdirectory name, but by observing other stored package metadata filenames:bash4809a191f68281dd19697243570e3258268c62c2aa4f3c20571b0289715aa9-@pnpm+patch-package@0.0.1.json aa5a6251e7f2de1255b3870b2f9be7e28a82f478bebb03f2f6efadb890269b-base64-js@1.5.1.json 5d3279e22b928e9df782a9528d4cfb45f5b01d788a4a1aaa5206b931f57d6e-meow@11.0.0.json
We can see that the metadata filename is truncated to 62 characters of the
hash
value. Therefore, the final file path forcompressible@2.0.18.json
is:bashv10/index/00/5debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68-compressible@2.0.18.json
CAS
Linking Strategy
Having introduced the storage layout of CAS
, let's now discuss how pnpm
links files from CAS
to projects.
Virtual Store (node_modules/.pnpm
)
pnpm
doesn't directly link files from the global CAS
to the node_modules
folder in the project root. Instead, it uses an intermediate layer structure, creating a hidden directory called .pnpm
under each project's node_modules
directory. This directory is known as the virtual store.
This .pnpm
directory is where the actual links (hardlink
or reflink
) to files in the global CAS
are stored. These links are organized according to the package name and version structure, for example .pnpm/<pkg-name>@<version>/node_modules/<pkg-name>/
.
Package Import Method (packageImportMethod
)
How pnpm
imports files from the global CAS
to the project's virtual store (.pnpm
) is controlled by the packageImportMethod
configuration option. Here's a detailed explanation of each option:
auto
(default): This ispnpm
's preferred strategy. It first tries to useclone
(i.e., reflink / copy-on-write). If the filesystem doesn't supportclone
, it tries to usehardlink
. If hardlinking also fails (e.g., when trying to link across filesystems), it finally falls back tocopy
(regular file copying). The default setting aims to intelligently choose the optimal feasible option for the current environment.clone
(reflink/copy-on-write): This is the fastest and safest method. It creates a reference to the original file data. If this file is later modified in the project'snode_modules
, the filesystem automatically creates a new copy without affecting the original file inCAS
. This approach both saves space (no data copying initially) and ensures isolation. However, it requires underlying filesystem support (e.g.,Btrfs
,APFS
, andXFS
withreflink
support). It provides the best balance between speed, space, and safety (isolation), but depends on modern filesystems.hardlink
: Creates a hard link. This means the file entry in the project'snode_modules
and the file entry inCAS
point to exactly the same physical data blocks on disk. This method is very space-efficient as it doesn't take up much additional space (just recordinginode
information). However, an important consequence is that if this hard-linked file is directly modified in the project'snode_modules
, it will also modify the original file inCAS
, potentially unintentionally corruptingCAS
and affecting other projects. Hard links require the source file and link target to be on the same filesystem. Extremely space-efficient but tightly couples projects with the store, with the risk of accidentally modifying store files.copy
: Performs standard file copying. This is the least efficient method in terms of disk space and installation speed, but it has universal applicability, working even across filesystems. It's a universal fallback solution but sacrificespnpm
's main advantages (disk space savings and installation speed).clone-or-copy
: Triesclone
first, falls back tocopy
if not supported.
The type of filesystem (such as Btrfs
, XFS
, EXT4
, APFS
) directly determines whether clone
or hardlink
is available. Additionally, in docker
build environments, since hard links or reflinks cannot be created between the host filesystem and container filesystem during build time, workarounds are needed, such as using BuildKit
's cache mounts or the pnpm fetch
command to pre-download dependencies.
packageImportMethod
Option Comparison
Option | Mechanism | Disk Efficiency | Speed | Filesystem Dependency | node_modules Modifiability |
---|---|---|---|---|---|
auto (default) | Prefers clone, then hardlink, finally copy | Best feasible | Best feasible | Yes | Depends on actual method used |
clone | Reflink (COW) | High | Fastest | Yes (Btrfs, APFS, etc.) | Safe (creates copy when modified) |
hardlink | Hard link | Highest | Fast | Yes (same filesystem) | Dangerous (directly modifies store) |
copy | File copy | Low | Slow | No | Safe (modifies copy) |
clone-or-copy | Prefers clone, then copy | Depends on clone support | Depends on clone support | Yes (for clone) | Depends on actual method used |
Symbolic Links in Root node_modules
The final step in the linking process is creating symbolic links (symlinks
) in the project root's node_modules
folder. These symbolic links point to the corresponding package directories in the .pnpm
virtual store.
A key feature is that only the project's direct dependencies (packages listed in the project's package.json
file's dependencies
, devDependencies
, optionalDependencies
) are symbolically linked to the root of node_modules
. This enforces pnpm
's strictness principle: application context cannot directly require
or import
transitive dependencies (dependencies of dependencies) that aren't declared in its package.json
. When node.js
resolves modules, it follows these symbolic links to find the actual package code in the .pnpm
directory.
Conclusion
pnpm
's efficiency stems from its carefully designed Content Addressable Storage and soft/hard linking mechanisms. It combines package integrity verification, content hashing for each file in packages, a global content addressable store (CAS
), index files for recording package structure, and a complex linking strategy (preferring reflinks or hard links to import all files from packages into the project's virtual store .pnpm
, then exposing direct dependencies in the root node_modules
through symbolic links). This series of mechanisms work together to achieve significant disk space savings and installation speed improvements.