Search code examples
gitgit-rewrite-history

Moving a file in git history


In my feature branch I have about 50 commits. In the first commit I created a file that gets modified a lot in subsequent commits. I now realise it would be better to store that file in a different directory, so I'd like to go back to the first commit and just create it in the right place to begin with, to keep the history clean. I can do this with an interactive rebase by editing the first commit and moving the file, but then all of the subsequent commits that touch that file will produce conflicts that I have to manually resolve. Is there a way to tell every commit that the file has been moved, so they just automatically apply their changes in the right place?


Solution

  • TL;DR

    Use git filter-branch. You can use the --index-filter for speed but this is harder to use; with just 50-ish commits, use --tree-filter which is much slower but much easier to use:

    git filter-branch --tree-filter <fill this in> --tag-name-filter cat -- --all
    

    You should generally do this on a copy (clone) of the original repository since it's easy to goof up the filter-branch and the easy way to recover from that is to remove the copy and start over.

    Once it works, remove all the refs/original/ names as described in the git filter-branch documentation. The repository will eventually de-bloat (the filter-branch will roughly double it in size temporarily).

    Long

    History, in Git, is (are?) the commits. To change history, you need to copy the old commits (which provide the old history) to new, different commits (which provide the new history). So your goal is to replace all 50-ish commits with new ones that are the same except that the file is relocated to some other path.

    As you mentioned, you can do this with interactive rebase, but it's a pain: rebase works by converting each commit-to-copy to a changeset (by comparing that commit to its parent, to see what changed) and then applying the same changes to some existing commit.

    There is a rather heavier-duty command, git filter-branch, whose purpose is to copy commits while applying some sort of commit-modifier(s). It has a lot of options because it's inherently very slow; but fundamentally, it consists of:

    • List out every commit (by hash ID) to be operated-on. In your case, that's simply "every commit". Also, create an empty map of old hash ID → new hash ID.
    • Then, starting from the root-most (oldest/most-ancestral) commit:

      1. Extract the commit into a temporary working area.
      2. Apply each of the various filters.
      3. Build a new commit from the result. Use the hash map to map the parent ID(s) so that the new commit points back to an earlier-copied new commit. This gives the command the new commit's hash ID.
      4. Add an entry to the map from old commit hash → new commit hash.
    • Finally, after doing the above for every commit to be filtered, loop over all the references that you tell it to change (mostly branch names, but tag names too if you use a --tag-name-filter):

      1. Rename the original reference from refs/whatever to refs/original/refs/whatever.
      2. Create a new refs/whatever using the new hash found in the map.

    At the end of this process, you have all the original commits (with refs/original to refer to them) plus all the new commits (using the branch names).

    If you only have one branch name (and no tags), the only name you need to supply is this one branch name, probably master, but --all will tell Git to look at all references, and --tag-name-filter cat will tell Git that the change it should make to the tag names as it updates them is to make no change after all.

    The --tree-filter directs git filter-branch that, for step 1 (extract commit), it should do the full and complete extraction, to a temporary directory that git filter-branch will construct on its own. (Other filter options attempt to get away with a much faster extract-only-to-temporary-index trick.) The command or commands you supply to tree-filter are run in this temporary directory, so if all you need to do is rename a file, the command:

    mv old-relative-path new-relative-path
    

    suffices (assuming a Unix/Linux-ish system).