Search code examples
github-actions

When is hashFiles() actually needed in the cache GitHub Action?


I'm using the cache action in GitHub Actions workflows, and I see hashFiles() used a lot to define cache keys, and I've used it myself for years, copy-pasting from the Maven example:

    - name: Cache Maven packages
      uses: actions/cache@v3
      with:
        path: ~/.m2
        key: ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }}
        restore-keys: ${{ runner.os }}-m2-v1-

However, in the GitHub interface, I've noticed I've several caches that maybe don't need to be created.

I mean, I run the build with mvn --update-snapshots ..., so, even if it restores always the same cache, no matter if the dependencies are changed, it should still work fine, for Maven keeps checking (quickly) what has changed, re-download new updates and downloads dependencies it doesn't find in the cached repo. So, it should be fine to have just one cache and one key (per OS).

Am I right? Would something like key: ${{ runner.os }}-m2-v1 still work, in the situation above? Or am I missing some possible troubles?

The same question is also on GitHub discussions.


Solution

  • That's where the restore-keys: come in!

        - name: Cache Maven packages
          uses: actions/cache@v3
          with:
            path: ~/.m2
            key: ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }}
            restore-keys: ${{ runner.os }}-m2-v1-
    

    If there isn't an exact match for the ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }} it will look for the latest cache with the prefix ${{ runner.os }}-m2-v1-*.

    If you'd create a single keyed cache, the runner would compare the contents at the end of the run and probably find out that files have changed, then would update the cache. If there are many differences between branches, that may cause some churn. Technically it would still work though. It mostly depends on how different and big these caches are.