In my feature branch I have about 50 commits. In the first commit I created a file that gets modified a lot in subsequent commits. I now realise it would be better to store that file in a different directory, so I'd like to go back to the first commit and just create it in the right place to begin with, to keep the history clean. I can do this with an interactive rebase by editing the first commit and moving the file, but then all of the subsequent commits that touch that file will produce conflicts that I have to manually resolve. Is there a way to tell every commit that the file has been moved, so they just automatically apply their changes in the right place?
Use git filter-branch
. You can use the --index-filter
for speed but this is harder to use; with just 50-ish commits, use --tree-filter
which is much slower but much easier to use:
git filter-branch --tree-filter <fill this in> --tag-name-filter cat -- --all
You should generally do this on a copy (clone) of the original repository since it's easy to goof up the filter-branch and the easy way to recover from that is to remove the copy and start over.
Once it works, remove all the refs/original/
names as described in the git filter-branch
documentation. The repository will eventually de-bloat (the filter-branch will roughly double it in size temporarily).
History, in Git, is (are?) the commits. To change history, you need to copy the old commits (which provide the old history) to new, different commits (which provide the new history). So your goal is to replace all 50-ish commits with new ones that are the same except that the file is relocated to some other path.
As you mentioned, you can do this with interactive rebase, but it's a pain: rebase works by converting each commit-to-copy to a changeset (by comparing that commit to its parent, to see what changed) and then applying the same changes to some existing commit.
There is a rather heavier-duty command, git filter-branch
, whose purpose is to copy commits while applying some sort of commit-modifier(s). It has a lot of options because it's inherently very slow; but fundamentally, it consists of:
Then, starting from the root-most (oldest/most-ancestral) commit:
Finally, after doing the above for every commit to be filtered, loop over all the references that you tell it to change (mostly branch names, but tag names too if you use a --tag-name-filter
):
refs/whatever
to refs/original/refs/whatever
.refs/whatever
using the new hash found in the map.At the end of this process, you have all the original commits (with refs/original
to refer to them) plus all the new commits (using the branch names).
If you only have one branch name (and no tags), the only name you need to supply is this one branch name, probably master
, but --all
will tell Git to look at all references, and --tag-name-filter cat
will tell Git that the change it should make to the tag names as it updates them is to make no change after all.
The --tree-filter
directs git filter-branch
that, for step 1 (extract commit), it should do the full and complete extraction, to a temporary directory that git filter-branch
will construct on its own. (Other filter options attempt to get away with a much faster extract-only-to-temporary-index trick.) The command or commands you supply to tree-filter
are run in this temporary directory, so if all you need to do is rename a file, the command:
mv old-relative-path new-relative-path
suffices (assuming a Unix/Linux-ish system).