Search code examples
gitlabgitlab-cigitlab-ci-runner

What is the point of `cache:key` in .gitlab-ci.yml?


According to the docs:

Since the cache is shared between jobs, if you’re using different paths for different jobs, you should also set a different cache:key otherwise cache content can be overwritten.

This sounds weird to me.

So if I'm "using different paths for different jobs" like this

job_a:
  paths:
    - binaries/

job_b:
  paths:
    - node_modules/

How could the cache be overwritten..?

Does it mean node_modules will overwrite binaries ?? because the cache key is the same?

Anyone knows the details of the implementation of cache in gitlab?

Does it works like this??


$job_cache_key = $job_cache_key || 'default';

if ($cache[$job_cache_key]){
    return $cache[$job_cache_key];
}

$cache[$job_cache_key] = $job_cache;

return $job_cache;

Solution

  • Cache keys in GitLab mimick Rails caching, although, as app/models/concerns/faster_cache_keys.rb mentions:

      # Rails' default "cache_key" method uses all kind of complex logic to figure
      # out the cache key. In many cases this complexity and overhead may not be
      # needed.
      #
      # This method does not do any timestamp parsing as this process is quite
      # expensive and not needed when generating cache keys. This method also relies
      # on the table name instead of the cache namespace name as the latter uses
      # complex logic to generate the exact same value (as when using the table
      # name) in 99% of the cases.
    

    The pipeline itself starts with initializing its local cache: lib/gitlab/ci/pipeline/seed/build/cache.rb

    You can see a cache example in spec/lib/gitlab/ci/pipeline/seed/build/cache_spec.rb

    Does it mean node_modules will overwrite binaries ?? because the cache key is the same?

    No: Each job will use their own paths set, which override any path set defined in a global cache.

    gitlab-org/gitlab-runner issue 2838 asks about cache per job, and give the example:

    stages:
    - build
    - build-image
    
    # the following line is the global cache configuration but also defines an anchor with the name of "cache"
    # you can refer to the anchor and reuse this cache configuration in your jobs.
    # you can also add and replace properties
    # In the job definitions you will find examples.
    # for more information regarding reuse in YAML files, see https://blog.daemonl.com/2016/02/yaml.html
    cache: &cache
        paths:
        - api/node_modules/
        - global/node_modules/
        - frontend/node_modules/
    
    # first job, it does not have an explicit cache definition:
    # therefore it uses the global cache definition!
    build-app:
        stage: build
        image: node:8
        before_script:
        - yarn
        - cd frontend
        script:
        - npm run build
    
    # a job in a later stage, have a look at the cache block!
    # it "inherits" from the global cache block and adds the "policy: pull" key / value
    build-image-api:
        stage: build-image
        image: docker
        dependencies: []
        cache:
            <<: *cache
            policy: pull
        before_script:
        # .... and so on
    

    That inheritance mechanism is also documented in the "Inherit global config, but override specific settings per job" section of caching

    You can override cache settings without overwriting the global cache by using anchors.
    For example, if you want to override the policy for one job:

    cache: &global_cache
        key: ${CI_COMMIT_REF_SLUG}
        paths:
          - node_modules/
          - public/
          - vendor/
        policy: pull-push
    
    job:
      cache:
        # inherit all global cache settings
        <<: *global_cache
        # override the policy
        policy: pull
    

    1+ year later (Q2 2021):

    See GitLab 13.11 (April 2021)

    Use multiple caches in the same job

    GitLab CI/CD provides a caching mechanism that saves precious development time when your jobs are running. Previously, it was impossible to configure multiple cache keys in the same job. This limitation may have caused you to use artifacts for caching, or use duplicate jobs with different cache paths. In this release, we provide the ability to configure multiple cache keys in a single job which will help you increase your pipeline performance.

    https://about.gitlab.com/images/13_11/cache.png -- Use multiple caches in the same job

    See Documentation and Issue.