Search code examples
gitgit-loggit-rev-listgit-verify-pack

Finding large files in the history of a git repository fails


My repository is very big because of some large files in the history. For finding the large files and removing them, I am executing:

$ git verify-pack -v .git/objects/pack/pack-..e8a.idx | sort -k 3 -n | tail -3 

and the result is something like:

12eb660ea206e1b7bd42cb8b525aabe9e86a5064 blob   56413247 15833578 5889838
89b377ace5639c0914bb49d28d0c8e97b0f19a16 blob   56414112 15833631 81736530
4ea83fb57b49f7afdbe99e4f043509d184338f5b blob   56426618 15837504 48628334

To find the path of the largest file, I run:

$ git rev-list --objects --all | grep 4ea83fb57b49f

and the result is:

4ea83fb57b49f7afdbe99e4f043509d184338f5b path/to/my/large_file

but, when I run git log on this file like:

$ git log --oneline --branches -- path/to/my/large_file

No log is shown. In addition, whenever I try to remove the binary file from history by:

$ git filter-branch --index-filter  \
'git rm --ignore-unmatch --cached path/to/my/large_file'

I receive:

WARNING: Ref 'refs/heads/master' is unchanged

Any ideas?


Solution

  • Maybe the path to large objects are located in different branches. Use --all in filter-branch command to remove large files from all the branches.