Search code examples
gitgithubbfg-repo-cleaner

Why can I still see files in GitHub history after cleaning them with the BFG?


I am working on a group project and I want to remove a file from all memory. The content, the file name, everything! I don't want any trace of this left on the Git repo. I have been trying to do this using bfg but I can still find the file on the Github page using it's "browse the repository at this point in history feature".

The directory which is the git repo is .../electricity_profiles and within the directory electricity_profiles/data there was the file I want to remove (I've tried bfg --delete-files .~lock.smart_meter_data_overlap.csv#). I have removed it from the current commit since, but it is a few commits back commit 5c50c67d1be4e869bc75fb7d3916b9fc814b8106.

How can I remove all evidence this file ever existed, even on github, and so when other people pull the file they won't see it?

I have looked at:

but haven't figured it out yet.

Work done so far: (Seems to work).

git clone --mirror https://github.com/oliversheridanmethven/electricity_profiles.git
bfg --delete-files .~lock.smart_meter_data_overlap.csv# electricity_profiles.git

Console output:

Using repo : /home/user/Documents/InFoMM/case_studies/trial/electricity_profiles.git

Found 20 objects to protect
Found 2 commit-pointing refs : HEAD, refs/heads/master

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 1b1eef47 (protected by 'HEAD')

Cleaning
--------

Found 22 commits
Cleaning commits:       100% (22/22)
Cleaning commits completed in 141 ms.

Updating 1 Ref
--------------

    Ref                 Before     After   
    ---------------------------------------
    refs/heads/master | 1b1eef47 | 9701a5b7

Updating references:    100% (1/1)
...Ref update completed in 26 ms.

Commit Tree-Dirt History
------------------------

    Earliest        Latest
    |                    |
    ......D..D..m.m.mmmmmm

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                            Before     After   
    -------------------------------------------
    First modified commit | 5c50c67d | ff47bcdf
    Last dirty commit     | 9671f6ad | f6d36763

Deleted files
-------------

    Filename                               Git id         
    ------------------------------------------------------
    .~lock.smart_meter_data_overlap.csv# | 7cf2b24f (92 B)


In total, 14 object ids were changed. Full details are logged here:

    /home/user/Documents/InFoMM/case_studies/trial/electricity_profiles.git.bfg-report/2017-01-18/11-48-37

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

finishing off the process.

cd electricity_profiles.git
git push --mirror https://github.com/oliversheridanmethven/electricity_profiles.git

Looking at the Github repo it seems to have worked.


Solution

  • I'm the author of the BFG - I re-titled your question to "Why can I still see files in GitHub history after cleaning them with the BFG?" because it likely better represents your issue.

    Your question description does not make this entirely clear, but I am guessing that in the report from the BFG run, the BFG did report it had deleted files (if the BFG had found no targets for deletion, it would have reported that as an error, and you don't mention seeing that, so my guess is that the BFG did find you files, and deleted them from history).

    First off, you need to make sure you were following all the steps at https://rtyley.github.io/bfg-repo-cleaner/#usage, particularly:

    • you were cleaning a mirror repo
    • you pushed this cleaned mirror repo back to GitHub.

    If you followed all those steps correctly, why could you still see files in GitHub history after cleaning them with the BFG? A possible explanation is that GitHub has not done garbage collection on that repo yet. GitHub only does GC periodically, so old commits are still visible for some time afterwards: