Search code examples
gitlabbfg-repo-cleanergit-filter-repo

Delete objects from Gitlab repository within a certain date range


I have to remove a number of files from a git repo on gitlab.com that are tracked with Git LFS. They were automatically updated in a nightly CI build over a number of weeks and I quickly ran out of storage space.

I have followed the original documentation on cleaning up a gitlab repo, but found it to not lower the used storage space. I probably did something wrong in my attempt to not delete this file across the entire history, but just within a certain date range.


Solution

  • The following steps using the bfg repo cleaner instead of git-filter-repo ended up working:

    (Note: git rev-list is your friend! This was the main thing that helped me filter out exactly which objects to delete from history. It's worth looking at its options!)

    1. Follow the original docs until step 6.
    • Generate a fresh export from the project and download it.
    • Decompress the backup using tar: tar xzf project-backup.tar.gz
    • Clone a fresh copy of the repository from the bundle using --bare and --mirror options: git clone --bare --mirror /path/to/project.bundle
    • Go to the project.git directory
    • Because cloning from a bundle file sets the origin remote to the local bundle file, change it to the URL of your repository: git remote set-url origin https://gitlab.example.com/<namespace>/<project_name>.git
    1. Download the bfg repo cleaner .jar file from here
    2. Find the object IDs for the file path to delete in certain parts of the history. The easiest way for me was to simply filter by dates (replacing <YYYY-MM-DD> with the start/end dates between which to scan the git repository for the file marked as <file path in repo>):
      git rev-list --all --objects --since <YYYY-MM-DD> --before <YYYY-MM-DD> | grep <file path in repo> > object_ids_to_delete.txt
      
    3. The file will contain a list of lines like <Object ID> <file path>, so just remove all instances of <file path>
    4. Run bfg on the repo with the given set of object IDs: java -jar bfg.jar -bi ./object_ids_to_delete.txt ./project.git
    5. Go to the repo folder: cd ./project.git
    6. Update the reflog: git reflog expire --expire=now --all && git gc --prune=now --aggressive (see bfg docs)
    7. Continue with the remaining steps from the gitlab docs:
    • To allow you to force push the changes you need to unset the mirror flag: git config --unset remote.origin.mirror
    • git push origin --force 'refs/heads/*'
    • (the rest of the push-steps weren't necessary for me...)
    1. Wait 30 minutes.
    2. Locate the commit map file that bfg generates: it should be in ./project.git.bfg-report/<date>/<timestamp>/object-id-map.old-new.txt
    3. Upload it in the "Repository Cleanup" part of the project settings, as described in the next step in the gitlab docs
    4. Wait some more...
    5. Success!