Skip to content

Content Addressable Storage of pnpm

pnpm (performant npm) is a package manager for node.js designed to improve installation speed and disk space efficiency, offering significant advantages over alternatives like npm and yarn classic. The core features of pnpm include:

  • Efficient installation speed through linking (hardlink or reflink) from a global store.
  • Strict dependency access rules enforced through symbolic links and structured directory design.
  • Built-in support for monorepo management.
  • Deterministic dependency installation through the pnpm-lock.yaml file.

The cornerstone of pnpm is its Content Addressable Storage mechanism. Unlike traditional package managers that copy dependency files in each project, pnpm stores files in a global store (typically one per disk or filesystem) based on their content hash. This means that if multiple different packages or different versions of the same package contain identical files, only one physical copy of that file will be stored on disk. This is key to pnpm's significant disk space savings.

Content Addressable Storage (CAS) Architecture

Content Addressing Principle

The core idea of Content Addressable Storage is: files are stored and retrieved using their content hash as a unique identifier. When pnpm needs to store a file, it calculates the file's content hash. If a file with the same hash already exists in the global store ([[pnpm store path]]/files/), pnpm knows the file content hasn't changed and doesn't need to be stored again. Therefore, even if 100 projects depend on lodash, or depend on different versions of lodash, identical files will only have one physical copy in the CAS. This is in stark contrast to the approach of early npm or yarn classic versions, which created dependency copies in each project's node_modules directory.

Advantages of CAS

  1. Disk Space Efficiency: CAS significantly reduces disk space usage, especially in scenarios with many projects or large dependencies, potentially saving several GB of space. More importantly, when updating dependencies, pnpm only needs to add files that have actually changed between versions to the store, rather than copying entire new versions of packages.
  2. Installation Speed: Since only missing files need to be downloaded from the store, and files are primarily linked (rather than copied) to projects, network transmission and disk I/O operations are greatly reduced, significantly improving installation speed. Additionally, pnpm can process package installations in parallel, further enhancing efficiency.

Store Location (storeDir)

The default location of the pnpm store depends on the operating system and environment variable settings. The store path can be customized by setting the storeDir option in the project root's .npmrc file, workspace configuration file pnpm-workspace.yaml, or global configuration file.

Default storeDir Locations

OSEnvironment Variable ConditionDefault Path
Linux$PNPM_HOME is set$PNPM_HOME/store
Linux$XDG_DATA_HOME is set$XDG_DATA_HOME/pnpm/store
LinuxOther cases~/.local/share/pnpm/store
macOS$PNPM_HOME is set$PNPM_HOME/store
macOSOther cases~/Library/pnpm/store
Windows$PNPM_HOME is set$PNPM_HOME/store
WindowsOther cases~/AppData/Local/pnpm/store

You can view the current pnpm global store path using the pnpm store path command.

WARNING

A crucial prerequisite is that for pnpm to use hard links or reflinks (copy-on-write links), the store must be on the same filesystem or drive as the project being installed. If storeDir points to a different drive, pnpm won't be able to create these efficient links and will fall back to file copying. While this ensures successful installation, it completely loses pnpm's core advantages in disk space and installation speed. File system limitations prevent hard links (and usually reflinks) from crossing partition or drive boundaries. pnpm intelligently detects this situation and prioritizes correctness (through copying) over efficiency when linking isn't possible.

Therefore, physical disk layout directly affects pnpm's performance. If users don't explicitly set storeDir and run pnpm install on different drives, pnpm will automatically create an independent store on each drive (e.g., creating D:\\.pnpm-store at the root of drive D:).

Role of pnpm-lock.yaml

The pnpm-lock.yaml file plays a crucial role. It precisely records the project's resolved dependency tree structure, including the exact version of each dependency and its own dependencies, ensuring that each installation generates an identical node_modules structure, achieving deterministic installation. This file also stores the integrity hash value (typically SHA-512) of each dependency package obtained from the package registry. This hash value is a unique fingerprint of the package's specific version content.

  • Key for Cache Lookup: It's the main basis for pnpm to look up whether the exact content package exists in the global content addressable store. pnpm will try to locate the cached file identified by this integrity.

  • Verification Standard for Cache Miss (Previously Downloaded Packages): If the corresponding package isn't found in the cache (cache miss), pnpm needs to download it. After downloading, the integrity value in pnpm-lock.yaml becomes the target hash value that must be achieved. pnpm will calculate the integrity value of the downloaded package (tarball file) and compare it with the integrity value in pnpm-lock.yaml. If they match, the downloaded package is considered complete and untampered.

  • Cache Miss (Never Downloaded Packages): If the corresponding package isn't found in the cache (cache miss), pnpm needs to download it. After downloading, pnpm will calculate the integrity value of the downloaded package (tarball file) and compare it with the integrity value recorded in the registry (default http://registry.npmjs.org/). If they match, the downloaded package is considered complete and untampered, and it will be recorded in pnpm-lock.yaml.

PS

  1. Tarball Integrity: During or after download, pnpm calculates the hash value of the downloaded tarball file and compares it with the expected integrity hash value recorded in pnpm-lock.yaml or registry metadata. If they don't match, it indicates the downloaded file may be corrupted or tampered with, and pnpm will error and abort the installation. Notably, for security reasons, the npm ecosystem has moved from the early insecure SHA-1 hash to the more cryptographically strong SHA-512 hash.
  2. Store Integrity (verifyStoreIntegrity): pnpm provides a configuration option verifyStoreIntegrity, which defaults to true. When this option is enabled, if pnpm finds that a file in the CAS has been modified since its last write, it will verify the file's content before linking it to the project's node_modules. This provides an additional layer of protection against accidental store corruption, but comes with a slight performance overhead.
  3. Lock File Integrity (Potential Issue): Although pnpm-lock.yaml ensures installation determinism, it itself may be accidentally corrupted or manually edited incorrectly during version control operations (like git merge conflicts). Currently, if the lock file is consistent with the dependency declarations in package.json, pnpm tends to trust the lock file's content for faster installation speed. However, the community discussion has proposed adding a lock file checksum (like a lockfileChecksum field) to quickly detect if the lock file has been tampered with without performing a full dependency resolution.

This design reflects the trade-off between installation speed and robustness. pnpm defaults to prioritizing speed by trusting the latest lock file to skip time-consuming dependency resolution and verification steps. However, lock file merge conflicts or manual edits may lead to silent errors. The verifyStoreIntegrity setting and proposed lock file checksum mechanism are both attempts to add safety nets without overly sacrificing performance, reflecting the inherent design challenges in optimizing package managers.

File Hashing, Indexing, and CAS Storage

Content Hash of Individual Files

When a package's tarball is successfully downloaded and verified, pnpm processes its contents. A key step is that pnpm calculates the content hash value (e.g., SHA-512) for each individual file in the package.

Here we need to clearly distinguish between two types of hash values: package integrity hash and file content hash.

  • Package Integrity Hash (recorded in pnpm-lock.yaml) is used to uniquely identify and verify a specific version of the entire package (i.e., the tarball file itself).

    Below is the relevant content from the pnpm-lock.yaml file for the compressible package:

    pnpm-lock.yaml
    yaml
    lockfileVersion: '9.0'
    
    settings:
      autoInstallPeers: true
      excludeLinksFromLockfile: false
    
    importers:
      .:
        dependencies:
          compressible:
            specifier: 2.0.18
            version: 2.0.18
    
    packages:
      compressible@2.0.18:
        resolution:
          {
            integrity: sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg==
          }
        engines: { node: '>= 0.6' }
    
      mime-db@1.54.0:
        resolution:
          {
            integrity: sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==
          }
        engines: { node: '>= 0.6' }
    
    snapshots:
      compressible@2.0.18:
        dependencies:
          mime-db: 1.54.0
    
      mime-db@1.54.0: {}
  • File Content Hash is used to identify the specific content of each independent file in the package, serving as the foundation for CAS storage and file deduplication. pnpm stores each file in the package according to its content hash value (storeDir/v10/files/).

    Below is the index file for the compressible package in pnpm's global store, which contains the integrity values of all files in the package, calculated by pnpm when downloading the package.

    compressible@2.0.18.json
    json
    {
      "name": "compressible",
      "version": "2.0.18",
      "requiresBuild": false,
      "files": {
        "LICENSE": {
          "checkedAt": 1745925782020,
          "integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==",
          "mode": 420,
          "size": 1233
        },
        "index.js": {
          "checkedAt": 1745925782021,
          "integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==",
          "mode": 420,
          "size": 1038
        },
        "package.json": {
          "checkedAt": 1745925782021,
          "integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==",
          "mode": 420,
          "size": 1311
        },
        "HISTORY.md": {
          "checkedAt": 1745925782022,
          "integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==",
          "mode": 420,
          "size": 1976
        },
        "README.md": {
          "checkedAt": 1745925782022,
          "integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==",
          "mode": 420,
          "size": 1797
        }
      }
    }

Package File Storage in CAS (v10/files/)

The pnpm store has a specific directory structure for storing these content-hash-based package file collections. In pnpm version v10, this structure is located at storeDir/v10/files/.

Taking compressible@2.0.18 as an example, the package's file collection includes:

bash
compressible@2.0.18
├── HISTORY.md
├── index.js
├── LICENSE
├── package.json
└── README.md

Taking index.js as an example, the path to retrieve the index file for compressible@2.0.18 stored in pnpm's global store is:

bash
00/5debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68-compressible@2.0.18.json

Opening this index file, we can see:

json
{
  "name": "compressible",
  "version": "2.0.18",
  "requiresBuild": false,
  "files": {
    "LICENSE": {
      "checkedAt": 1745925782020,
      "integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==",
      "mode": 420,
      "size": 1233
    },
    "index.js": {
      "checkedAt": 1745925782021,
      "integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==",
      "mode": 420,
      "size": 1038
    },
    "package.json": {
      "checkedAt": 1745925782021,
      "integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==",
      "mode": 420,
      "size": 1311
    },
    "HISTORY.md": {
      "checkedAt": 1745925782022,
      "integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==",
      "mode": 420,
      "size": 1976
    },
    "README.md": {
      "checkedAt": 1745925782022,
      "integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==",
      "mode": 420,
      "size": 1797
    }
  }
}

The integrity hash value of the index.js file is:

bash
sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==

pnpm stores the file according to this integrity value in the following way:

  1. The content hash value is typically converted to a hexadecimal string.

    ts
    function base64ToHexNode(base64: string): string {
      return Buffer.from(base64, 'base64').toString('hex');
    }
    const hex = base64ToHexNode(
      'sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg=='.slice(
        7
      )
    );
    console.assert(
      hex ===
        'd7f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276'
    );
  2. The first two characters of the obtained hexadecimal hash value are used as the name of a subdirectory.

    The obtained hex value is:

    bash
    d7f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276

    The first two characters are d7, so the file will be stored in the storeDir/v10/files/d7/ directory.

  3. The remaining part of the hash value forms the filename in the subdirectory.

    In this example, the remaining part is:

    bash
    f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276

By observing other stored file paths:

bash
00/0a57539c3ea49736380ab214874f4528b455742d1f21c9d8027015cfae1372aa80382fefb88fd0abdd89c27c2d1332561e5186e4fa9ec8c348fc5768ca5b22
00/0cdacf5df3cc29401b02c11f6b4c473b193dfff269021e66d0749bdbe81b1c2336a5a72d1384a0d721b72064857a031a4c937653ed07c5e38731682c4c3eb8
1f/1a6454d1d08337c03084eb136fcd6f1ca1448a7f10cf150bd6dbb65aef9fa3be5b182f02d9127dc0ceadb3ad9a3a0911a98a7b63538afed05b8c5581e357db

We can see that the stored file's filename is truncated to 124 characters of the hash value.

Therefore, the compressible@2.0.18/index.js file will be stored at:

bash
storeDir/v10/files/d7/f33f96ca04f3c096a02da2d01f07ff4ff53aa70daaeb4f2a4399eabe154455cb0fc5e47e08818d64dac92aba0f138f1b7b5ae718957dd3d99a7dbfdb05e276

The v10 in the path indicates this is the tenth major version of pnpm's storage layout. This means pnpm's internal structure may be optimized and evolved with version iterations, although the core concept of CAS remains unchanged. This is important to understand, especially for users who try to interact directly with the store (which is generally not recommended), as the specific internal layout is an implementation detail that may change.

CAS Metadata File Storage in CAS (v10/index/)

In addition to storing individual files, pnpm also needs to store package metadata, which includes the mapping between package name, version, filename, and file content hash value. The storage method (CAS) is similar to v10/files/, but the directory for storing metadata files is separate from the directory for storing package files, namely v10/index/.

Taking the package metadata file content of compressible@2.0.18 as an example:

compressible@2.0.18.json
jsonc
{
  "name": "compressible",
  "version": "2.0.18",
  "requiresBuild": false,
  "files": {
    "LICENSE": {
      "checkedAt": 1745925782020,
      "integrity": "sha512-iwWNtxw0wslfJ9F2n8JXm5zpCbwR7Gu+rFrXsuO44MbiUu1TZov30H2eWdbt7+mvwcEp0nSR1CyUCYsPUbI+WQ==",
      "mode": 420,
      "size": 1233
    },
    "index.js": {
      "checkedAt": 1745925782021,
      "integrity": "sha512-1/M/lsoE88CWoC2i0B8H/0/1OqcNqutPKkOZ6r4VRFXLD8XkfgiBjWTaySq6DxOPG3ta5xiVfdPZmn2/2wXidg==",
      "mode": 420,
      "size": 1038
    },
    "package.json": {
      "checkedAt": 1745925782021,
      "integrity": "sha512-Wk1cB0QJbo7jJUs0U25uFc/2lqJNNbLaXrcPDQ+S9gxPpNcOZYDnHqLtJ0Ep2893chz2MdxN1mArJloAXbTERw==",
      "mode": 420,
      "size": 1311
    },
    "HISTORY.md": {
      "checkedAt": 1745925782022,
      "integrity": "sha512-HjgVdL/NlSwDhkut3BCwnal4ahlNd2eLzPCGPARs0Wy+rjHtAJNL7/s426/dIwbg7NI5wnkxuhLImjx+ULgavg==",
      "mode": 420,
      "size": 1976
    },
    "README.md": {
      "checkedAt": 1745925782022,
      "integrity": "sha512-GahpusTGRt2M/0s2vsEMBfDm0bBXMVlnhBVY+xoDcfBKn4rGbeW9PGQWZ7LM1adQWm8Q3WizIF1D1jwI97IS7w==",
      "mode": 420,
      "size": 1797
    }
  }
}
  • Content: Each metadata file serves as a manifest for a specific version of the package, containing package metadata (such as name and version), and most importantly, the mapping between all original filenames in the package and their corresponding content hash values. Through this mapping, pnpm knows how to find all files that make up the package from the v10/files/ directory and organize them together with the correct directory structure.

  • Naming/Location:

    The naming rule for package metadata files is [[integrity_hash]]-[[package]]@[[version]].json.

    Taking compressible@2.0.18.json as an example, the integrity hash of the compressible package obtained from pnpm-lock.yaml is:

    bash
    sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg==

    Convert this integrity hash to a hexadecimal string using the base64ToHexNode function.

    ts
    const hex = base64ToHexNode(
      'sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg=='.slice(
        7
      )
    );
    console.assert(
      hex ===
        '005debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68cfb51a460d36699743dbd5708ee89783081769d76e8282cf6c331a928e063246'
    );

    Similarly, use the first two characters 00 as the subdirectory name, but by observing other stored package metadata filenames:

    bash
    4809a191f68281dd19697243570e3258268c62c2aa4f3c20571b0289715aa9-@pnpm+patch-package@0.0.1.json
    aa5a6251e7f2de1255b3870b2f9be7e28a82f478bebb03f2f6efadb890269b-base64-js@1.5.1.json
    5d3279e22b928e9df782a9528d4cfb45f5b01d788a4a1aaa5206b931f57d6e-meow@11.0.0.json

    We can see that the metadata filename is truncated to 62 characters of the hash value. Therefore, the final file path for compressible@2.0.18.json is:

    bash
    v10/index/00/5debecfe5d5b12fc331c884d132539140d68e036224005693af893b054ba68-compressible@2.0.18.json

CAS Linking Strategy

Having introduced the storage layout of CAS, let's now discuss how pnpm links files from CAS to projects.

Virtual Store (node_modules/.pnpm)

pnpm doesn't directly link files from the global CAS to the node_modules folder in the project root. Instead, it uses an intermediate layer structure, creating a hidden directory called .pnpm under each project's node_modules directory. This directory is known as the virtual store.

This .pnpm directory is where the actual links (hardlink or reflink) to files in the global CAS are stored. These links are organized according to the package name and version structure, for example .pnpm/<pkg-name>@<version>/node_modules/<pkg-name>/.

Package Import Method (packageImportMethod)

How pnpm imports files from the global CAS to the project's virtual store (.pnpm) is controlled by the packageImportMethod configuration option. Here's a detailed explanation of each option:

  • auto (default): This is pnpm's preferred strategy. It first tries to use clone (i.e., reflink / copy-on-write). If the filesystem doesn't support clone, it tries to use hardlink. If hardlinking also fails (e.g., when trying to link across filesystems), it finally falls back to copy (regular file copying). The default setting aims to intelligently choose the optimal feasible option for the current environment.
  • clone (reflink/copy-on-write): This is the fastest and safest method. It creates a reference to the original file data. If this file is later modified in the project's node_modules, the filesystem automatically creates a new copy without affecting the original file in CAS. This approach both saves space (no data copying initially) and ensures isolation. However, it requires underlying filesystem support (e.g., Btrfs, APFS, and XFS with reflink support). It provides the best balance between speed, space, and safety (isolation), but depends on modern filesystems.
  • hardlink: Creates a hard link. This means the file entry in the project's node_modules and the file entry in CAS point to exactly the same physical data blocks on disk. This method is very space-efficient as it doesn't take up much additional space (just recording inode information). However, an important consequence is that if this hard-linked file is directly modified in the project's node_modules, it will also modify the original file in CAS, potentially unintentionally corrupting CAS and affecting other projects. Hard links require the source file and link target to be on the same filesystem. Extremely space-efficient but tightly couples projects with the store, with the risk of accidentally modifying store files.
  • copy: Performs standard file copying. This is the least efficient method in terms of disk space and installation speed, but it has universal applicability, working even across filesystems. It's a universal fallback solution but sacrifices pnpm's main advantages (disk space savings and installation speed).
  • clone-or-copy: Tries clone first, falls back to copy if not supported.

The type of filesystem (such as Btrfs, XFS, EXT4, APFS) directly determines whether clone or hardlink is available. Additionally, in docker build environments, since hard links or reflinks cannot be created between the host filesystem and container filesystem during build time, workarounds are needed, such as using BuildKit's cache mounts or the pnpm fetch command to pre-download dependencies.

packageImportMethod Option Comparison

OptionMechanismDisk EfficiencySpeedFilesystem Dependencynode_modules Modifiability
auto (default)Prefers clone, then hardlink, finally copyBest feasibleBest feasibleYesDepends on actual method used
cloneReflink (COW)HighFastestYes (Btrfs, APFS, etc.)Safe (creates copy when modified)
hardlinkHard linkHighestFastYes (same filesystem)Dangerous (directly modifies store)
copyFile copyLowSlowNoSafe (modifies copy)
clone-or-copyPrefers clone, then copyDepends on clone supportDepends on clone supportYes (for clone)Depends on actual method used

The final step in the linking process is creating symbolic links (symlinks) in the project root's node_modules folder. These symbolic links point to the corresponding package directories in the .pnpm virtual store.

A key feature is that only the project's direct dependencies (packages listed in the project's package.json file's dependencies, devDependencies, optionalDependencies) are symbolically linked to the root of node_modules. This enforces pnpm's strictness principle: application context cannot directly require or import transitive dependencies (dependencies of dependencies) that aren't declared in its package.json. When node.js resolves modules, it follows these symbolic links to find the actual package code in the .pnpm directory.

Conclusion

pnpm's efficiency stems from its carefully designed Content Addressable Storage and soft/hard linking mechanisms. It combines package integrity verification, content hashing for each file in packages, a global content addressable store (CAS), index files for recording package structure, and a complex linking strategy (preferring reflinks or hard links to import all files from packages into the project's virtual store .pnpm, then exposing direct dependencies in the root node_modules through symbolic links). This series of mechanisms work together to achieve significant disk space savings and installation speed improvements.

Contributors

Changelog

Discuss

Released under the CC BY-SA 4.0 License. (abd9c64)