Search code examples
gitbfg-repo-cleaner

How to delete old commits without affecting history


I need to delete commits made 1 year ago because they contain sensitive data that must be removed.

I have used BFG Repo-Cleaner, and I have been able to almost delete everything, but there are some very old commits that are not being removed.

I will try to write an example; The GIT history looks like this

  • C -> secret files do not exist
  • B -> secret files are removed
  • A -> secret files were added

(A being the oldest and C the newest commit)

And this is what I would need (B does not exist anymore, but later commits are not affected):

  • C -> secret files do not exist
  • A -> secret files were added

I'm working in a big team so, unless there is no other option, I would like to avoid using git push -f.

What is the best way to achieve this?

Thank very much.

(edit)

The reason for this is that we have a regular scan on our repo that detected commit A as a vulnerability.

We made commit B, were we deleted all credential and secret files, and the problem is that the scan also detects the commit B as a 'security issue'.

We are asked to remove commit B to pass the scan.


Solution

  • TL;DR

    • you must rewrite commit A to not contain the sensitive file in the first place
    • you must use git push -f
    • you're not done yet: you must still clean the history on the server

    Rewrite commit A and the whole history

    This should be what bfg did for you. I assume you ran something like bfg --delete-files <sensitive-file>. This should have created a whole new history where <sensitive-file> never existing: commits that added or modified it, as well as other files, should be rewritten without that file. Commits that only touched it should disappear, since they would now be empty commits.

    So now you have commit A', a copy of A without <sensitive-file>. The rest of the history is rewritten as its successors: C', etc.

    To confirm that this happened correctly, run this command in both an old sandbox and the new one updated by bfg:

    git log --all <sensitive-file>
    

    You should see the commits touching the sensitive file in the original repo but no output in the new one. This is how you can be confident the file is really removed from the history.

    You must use git push -f

    The sha1 of a Git commit is a cryptographic signature of a commit, all its meta data (committer, date, comment, etc), all its contents, and all its history.

    If you change any one aspect of the commit: the date, the comments, the contents, or any one aspect of any of its ancestors, the cryptographic signature changes, by definition.

    So the only way forward is a git push -f.

    You're probably not done

    But wait, after doing git push -f, the server will still have copies of the old history. See here for GitHub: If you pushed to GitHub, it is too late even if you force push it away one second later. Apparently, the only truly safe way to eradicate the sensitive file from a GitHub repo is to delete it and recreate a new one with only the clean history you want to keep. There are other solutions, but your mileage may vary - details in the linked post.

    If you're using a different or private Git server, make sure to force garbage collection and follow further recommendations at Remove sensitive files and their commits from Git history