I wonder if a gitlab runner that runs in a shell and uses a local cache comes with a inbuilt mechanism from to clean up the cache. I use a cache in which dependencies are cached per branch. At some point the branch will be merged and the cache could actually be deleted. But it seems to me that this does not happen by default, which means that the machine on which the runner is running will be full at some point. I currently use a sheduled pipeline that runs once a week to delete cached data that has not been touched for at least 14 days.
I currently use a sheduled pipeline that runs once a week to delete cached data that has not been touch for at least 14 days.
build:foo:
cache:
key: $CI_COMMIT_REF_NAME
paths:
- depency_path/
...
rules:
- if: $CI_PIPELINE_SOURCE == "push"
---
clean_runner_name:
stage: clean
tags:
- runner_name
script:
- cd path_to_cache
- find ./ -type f -mtime +14 -delete #delete old files
- find . -type d -empty -delete #delete empty directories
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
This works well, but seems to me like an abuse of the sheduled pipeline. Especially that the runner processes the pipeline of several repos.
So my question is, is there a better way to do this, and what would be the cleanest way. Maybe call a clean_up script from the runner's defined in teh runners toml script with cleanup_execthat does the job? Or run a service independent of the runner on the maschine that regularly scans the cache for obsolete elements and deletes them? Or maybe does gitlab have a built-in mechanism for this which i have overseen?
Thank you very much for your suggestions and ideas!
You've pretty much assessed the situation correctly. GitLab does not ever delete your cache, even when "clearing" the cache. You must delete files from the cache storage yourself, if you wish.
Each time you clear the cache manually, [...]. The old cache is not deleted. You can manually delete these files from the runner storage.
If you're using the default runner local storage, you might just use a cronjob (or Scheduled Task on Windows) or service on your runner system host instead of the scheduled pipeline, for example.
If you use distributed/remote cache storage like AWS S3, you might use an object expiration lifecycle policy to keep storage usage low.