Search code examples
gitgit-rewrite-historybfg-repo-cleaner

Permanently removing binary files from GitLab repos


We have a GitLab-hosted repo at work that contains some large binary files that we'd like to remove. I know of tools such as BFG Repo-Cleaner which will remove a file from a Git repository.

We often refer to specific commit IDs in GitLab. Would running BFG Repo-Cleaner mess these up?

If so, is there a better way to clean a repo that wouldn't mess these up?


Solution

  • We often refer to specific commit IDs in GitLab.

    Although git history can't be modified without changing all subsequent commit ids, the BFG does a few things that will help with the change:

    1. As it's cleaning your repo, the BFG also updates any object ids it finds in commit messages with their new ids. If you are deleting private data, it's a straight substitution, if you're just deleting big files (ie the commit ids themselves don't imply sensitive information), the text in your commit message becomes "$newId [formerly $oldId]" and in addition, a Former-commit-id: footer will be added to the bottom of all modified commit messages.
    2. The BFG also creates a object-id-map.old-new.txt file under the repo-name.bfg-report directory every time it runs. In principle, I believe this file could be used on a GitLab repo so that other references to commit ids could be fixed too.

    Full disclosure: I'm the author of the BFG Repo-Cleaner.