Search code examples
github-actions

What is the logic in using the restore-keys field in the GitHub cache action?


I am trying to understand how to use the cache action in GitHub to cache dependencies.

In particular, I am struggling with the concept of the "fallback" restore-keys. In the example given in the documentation, we have:

      - name: Cache node modules
        id: cache-npm
        uses: actions/cache@v3
        env:
          cache-name: cache-node-modules
        with:
          # npm cache files are stored in `~/.npm` on Linux/macOS
          path: ~/.npm
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-build-${{ env.cache-name }}-
            ${{ runner.os }}-build-
            ${{ runner.os }}-

With key, I understand that a specific match on the hash of certain files is being attempted. If there is a miss on this, the restore-keys are examined in turn, and these attempt increasingly less specific matches.

What I don't get is why these would be assumed to yield acceptable alternatives. If the key relies on matching certain files, why would a weaker match suffice, and if it would suffice, why go to the trouble of caching for specific hashes?

I am trying to understand this in the context of a task that performs a code generation activity, and the generated code is cached. The key in this case is a hash of the input files on which the generated code depends. This makes sense. But there is also a weaker restore-key that ignores the input file hash. As in my question above, I can't understand why a hit on the restore-key would be suitable because presumably this implies that the cached generated code is not a match for the input files.


Solution

  • My understanding of that example is simply to drive home the feature of using multiple restore-keys strings with decreasing specificity. As shown it implies that there may be other workflows generating caches that might be reusable.

    To expand on this an important concept is that actions/cache@v3 is caching the local filesystem used to store npm's local package cache from a library called cacache

    cacache is a Node.js library for managing local key and content address caches. It's really fast, really good at concurrency, and it will never give you corrupted data, even if cache files get corrupted or manipulated. It was written to be used as npm's local cache

    The npm cache is strictly a cache: it should not be relied upon as a persistent and reliable data store for package data. npm makes no guarantee that a previously-cached piece of data will be available later, and will automatically delete corrupted contents. The primary guarantee that the cache makes is that, if it does return data, that data will be exactly the data that was inserted.

    The take away from these snippets is that older or unrelated cache contents may still be functional, and will be automatically self correcting if not.

    If generating caches with the following name, the following looks for the identical match and restores the filesystem.

    ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
    

    No identical match found, because the lock file changed, use the newest or latest cache available.

    ${{ runner.os }}-build-${{ env.cache-name }}-
    

    Say you have another workflow in the monorepo that builds a separate node project and it maintains its own cache. It uses a different cache-name. Since this is the first time this workflow has ran it will cache miss until the following and reuse the cache store from a different workflow with a different cache-name

    ${{ runner.os }}-build
    

    This could presumably pull cache files from a workflow that established a cache using a test build e.g. ${{ runner.os }}-test

    ${{ runner.os }}-