I'm really new to using git, and made the mistake to also push my (big) data file (on big .RData file) to my online repository on gitlab. Now my maximum size limit is reached and I can't do any more pushes. So I would like to remove the data file. I found git's filter-branch
command. However the problem is: In the very early commits the file was called datafile_early.RData
, then after a few commits that file got deleted and replaced by datafile_later.RData
(I'm also working with others on that repository).
So how do I purge the datafile_early.RData
from the history? I tried:
git filter-branch -f --tree-filter 'rm datafile_early.RData'
, it started removing it from the first commits but failed beacuase of the later commits it could not find the file anymore.
Rewrite a9c05c45dd0c2dacb7ba79cf829fb76a3fb70da3 (4/22) (22 seconds passed, remaining 99 predicted) rm: datafile_early.RData: No such file or directory
tree filter failed: rm datafile_early.RData
What other options do I have?
If using git filter-branch
:
--tree-filter
is very slow; use --index-filter
if at all possible.The second point is the one Lasse V. Karlsen mentioned in a comment: you'd probably want your tree filter command to read rm -f datafile_early.RData datafile_later.RData
to remove whichever of these files exist, and then succeed even if it removed nothing.
To address the first point, note that a tree filter consisting of rm
commands can be replaced with an index filter consisting of git rm --cached
commands. In this case the appropriate matching command would be:
git rm --cached --ignore-unmatch datafile_early.RData datafile_later.RData
The entire git filter-branch
command is therefore probably:
git filter-branch \
--index-filter \
'git rm --cached --ignore-unmatch datafile_early.RData datafile_later.RData' \
--tag-name-filter cat -- --all
(optionally, remove the backslash-newline sequences to make this all one line) which should run in considerably less time than the --tree-filter
variant.