Search code examples
gitmergerenamegit-mergegit-merge-conflict

What workflow can mitigate git merge conflicts when a file is renamed and a new file is created with the old name?


Problem

  1. I have root branch master and feature branched from master.
  2. master has file A
  3. feature has moved file A to B and created a new file A
  4. Work in master against file A needs to be merged into feature branch file B
  5. Merging master into feature results in a merge conflict. master file A tries to be merged into feature file A even though master A and feature A are no longer related. How do I tell git to instead merge master A into feature B?

Steps Reproduce

In a console:

mkdir git_rename_demo
cd git_rename_demo

git init

echo "LineB1\nLineB2\nLineB3\nLineB4" > A.txt
git add A.txt
git commit -m "Add A"

git checkout -b rename_A_to_B
git mv A.txt B.txt
echo "LineA1\nLineA2\nLineA3" > A.txt
git add A.txt
git commit -m "Moved Old A to B and Added New A"

git checkout master
echo "LineB5\nLineB5" >> A.txt
git add A.txt
git commit -m "Added More LineBs to A"

git checkout rename_A_to_B
git merge master

Scenario

I have a master branch and a feature branch. A file, A on the master containing code that does "A" related logic.

On the feature branch, it was discovered that file A didn't make sense as a file name because the code is more related to "B" logic. Simultaneously, new code was written that does relate to "A" logic in feature branch. To fix this, file A was renamed to file B, effectively meaning all original "B" related logic in file A has been moved to new file B. All new "A" related logic was added to a new file called A, effectively replacing the old A file.

Work has continued on master branch, adding more "B" logic to the A file that still exist there. This is the file, that on the feature branch has been renamed to B.

The time comes where work from master needs to be merged into feature, as feature will continue to be developed separately from master until a later date. The merge of master into feature results in the conflict outlined above. We need to continue to allow developers working on master to do "B" related work against A and be able to merge that A file work into feature branch B file without having to manually resolve the conflict each time.


Solution

  • TL;DR

    There is no really good way to handle this, but you can do it somewhat manually. See below.

    Long

    Don't worry too much about branch names. Do worry about commits; merge is based on commits, not branch names. Branch names just help you, and Git, find particular commits. A branch name always contains the raw commit hash ID of the last commit in the branch, by definition. Adding a new commit consists of:

    1. Check out the branch by name, so that Git knows which branch name to update in step 3. This makes the tip commit of the branch be the current commit. The special name HEAD is now attached to the branch name, and therefore, HEAD selects the current commit, which is the last (or tip) commit in the branch.

    2. Make a new commit, in the usual way. Git will create the commit, setting its parent to the current commit. This new commit will get its own unique hash ID, different from that of every prior commit, and different from every future commit too.

    3. At this point, having made the new commit, Git writes the new commit's hash ID into a branch name: the one to which HEAD is attached. So now the branch name points to the last (tip) commit in the branch. That new commit points back to what used to be the tip; the branch is now one commit longer.

    Whenever you do a real merge (there are some fake kinds of merges), there are three commits involved:

    • the merge base, which is the best common ancestor to the other two commits;
    • your current commit (or HEAD), usually the tip of a branch;1 and
    • the commit you select in your command line: often, the tip of another branch.

    Git finds the merge base commit on its own. You just name the other commit. If you want to see, in advance, which commit is the merge base, you can run:

    git merge-base --all other
    

    where other is whatever you plan to supply to the git merge command, from which Git picks the third commit.


    1The other option is that you may be in detached HEAD state, in which the special name HEAD just contains the raw hash ID of the commit itself. Either way, HEAD names the current commit. It just does so using a branch name when you're on a branch. But normally you're on a branch when you run merge.


    How merge works in a nutshell, which is also where your problem comes from

    Imagine a simplified commit timeline, with later commits towards the right, in which we replace the commit hashes with single uppercase letters to make it easier to talk about them. We might have:

              I--J   <-- branch1 (HEAD)
             /
    ...--G--H
             \
              K--L   <-- branch2
    

    Here the name branch1 finds, or points to, our current commit, whose hash ID is J. The name branch2 points to commit L. We'd like to merge branch branch2 into branch1, which really means merge commit L into commit J to produce new merge commit M. Git automatically finds the best shared commit, which is commit H: the last commit that's on both branches.

    In order to merge our changes with their changes, Git has to figure out what we changed, and what they changed. Since each commit holds a full snapshot of all files, that's relatively easy: Git will diff the snapshot in commit H against the snapshot in our HEAD commit J, to see what we changed. And, Git will also diff the snapshot in H vs that in L, to see what they changed.

    You can run these two git diff commands on your own:

    git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
    git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed
    

    This is where your issue comes in: --find-renames in the first git diff would find that file A in the merge base, which presumably is also called A in their commit, is now called B in your HEAD commit. So the diff would pair up "A in base, B in ours" and compare the contents. That's what we changed: the contents changed, and the file's name changed.

    The second git diff would pair up "A in base, A in theirs" to see what they changed. When merge goes to combine the changes, that would all work. The combined changes would be:

    • whatever we changed, if anything, plus renaming the file;
    • whatever they changed, if anything.

    Git would apply these combined changes to the snapshot in commit H, which would rename the file.

    But for Git to find the rename in the first place, Git must not pair up file A in commit H with a file named A in commit the --ours commit (J). If a file named A exists, Git automatically assumes that it's "the same" file.

    When you run git diff manually like this, you can add an extra option to tell Git not to assume that. When git merge runs git diff on its own, it never provides this option.

    Note that for all other files, everything works OK. Suppose that in file F1, which has the same name in all three commits, you changed some early lines, and they changed some later lines. Git applies both changes to the copy of F1 from H, and moves on to the next file.

    At some point, though, Git might hit some file—maybe F2—where you and they changed the same lines. Git will now stop and make you clean up the mess.

    Git leaves you four files, not just one, with the mess in them:

    • There's a copy of file F2 from the merge base in Git's index.
    • There's a copy of file F2 from your commit in Git's index.
    • There's a copy of file F2 from their commit in Git's index.
    • Last, there's Git's best effort at combining these changed files, in the work-tree.

    The work-tree files are the ones you are used to using. They are right there, to be seen and edited and compiled or whatever it is you do with files.

    The copies that are in Git's index are hard even to see, but you can get all three of them out with git show or with git checkout. The git mergetool command does this, extracting a file named F to F.BASE, F.LOCAL, and F.REMOTE ... and then trying to run a merge tool on these three files to produce the merged file F, and then removing these three files. So mergetool is almost OK, but it does too much.

    You can fall back on the git show method:

    git show :1:F > F.LOCAL    # :1: means the merge base version
    git show :2:F > F.OURS     # :2: means our version, like `git checkout --ours`
    git show :3:F > F.THEIRS   # :3: means their version, like `git checkout --theirs`
    

    But in this case, Git is merging the wrong files, so this doesn't help directly. Still, it's important to know.

    The above is about what goes wrong. Here is what you can do

    You ran:

    git merge other
    

    Git found the three commits: base, HEAD, and other. It ran two git diffs, from base to HEAD to see what you did, and base to other to see what they did.

    One of these two diffs should have found a rename, but did not. It then tried to merge A from the base with A from the local/--ours commit and A from the other/--theirs/remote commit.

    You could extract the three versions of A and then merge them "by hand", kind of like git mergetool does. But the real trick here is that don't want A.LOCAL at all, you want B.LOCAL or B.OURS depending on what you like to call it.

    You have B.LOCAL already. It's just called B. So what you want is to:

    • extract the merge-base version of file A: use git show :1:A > A.base
    • extract their version of file A: use git show :3:A > A.other
    • merge these three files, writing the merge result to file B.

    OK, the first two bullet points are straightforward. But now we still have to merge three files. Well, if you have a merge tool you like, you can use that directly! If not, we can have Git do it, the same way Git does it automatically, but we control the input files. We just use the command:

    git merge-file B A.base A.other
    

    which merges our existing B with their A and A.other. There may be merge conflicts, as usual; they are to be fixed in the usual way now.

    Of course Git made a mess of our file A, but we can fix that too:

    git checkout --ours A
    

    which extracts :2:A from the index into the work-tree, followed by:

    git add A
    

    which removes the three :1:, :2:, and :3: slot entries and puts the checked-out index-version-2 of A in as the to-be-committed file A.

    If you have to do this often, you can script part of it (all except for resolving any actual conflicts).