Search code examples
gitrebasegit-merge-conflict

Avoiding a merge conflict in interactive rebase while reordering history


I have a file myfile.txt

Line one
Line two
Line four

where each line has been added in a separate commit.

I edit the file to add a "missing" line, so the file is now

Line one
Line two
Line three
Line four

This bash script sets up the repository:

#!/bin/bash

mkdir -p ~/testrepo
cd ~/testrepo || exit
git init

echo 'Line one' >> myfile.txt
git add myfile.txt
git commit -m 'First commit' 

echo 'Line two' >> myfile.txt
git commit -m 'Second Commit' myfile.txt

echo 'Line four' >> myfile.txt
git commit -m 'Third commit' myfile.txt

sed -i '/Line two/a Line three' myfile.txt
git commit --fixup=HEAD^ myfile.txt

The history looks like this

$ git --no-pager log  --oneline 
90e29ee (HEAD -> master) fixup! Second Commit
6a20f1a Third commit
ac1564b Second Commit
d8a038d First commit

I run an interactive rebase to combine the fixup commit into "Second Commit", but it reports a merge conflict:

$ git rebase -i --autosquash HEAD^^^
Auto-merging myfile.txt
CONFLICT (content): Merge conflict in myfile.txt
error: could not apply 90e29ee... fixup! Second Commit
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 90e29ee... fixup! Second Commit

$ git --no-pager diff
diff --cc myfile.txt
index b8b933b,43d9d5b..0000000
--- a/myfile.txt
+++ b/myfile.txt
@@@ -1,2 -1,4 +1,7 @@@
  Line one
  Line two
++<<<<<<< HEAD
++=======
+ Line three
+ Line four
++>>>>>>> 90e29ee... fixup! Second Commit
  • Why does moving the fixup commit from the HEAD of the branch to a position between "Second Commit" and "Third Commit" generate a merge conflict?
  • Is there a way that I can execute the rebase and avoid the conflict, or have it resolved automatically?

The desired history is would be

xxxxxxx (HEAD -> master) Third commit
xxxxxxx Second Commit
d8a038d First commit

where "Second Commit" looks like this:

diff --git a/myfile.txt b/myfile.txt
index e251870..802f69c 100644
--- a/myfile.txt
+++ b/myfile.txt
@@ -1 +1,3 @@
 Line one
+Line two
+Line three

Solution

  • TL;DR

    What you've run into here is basically an edge case in merges. You just have to fix these manually. You might wonder why I'm talking about merges, when you're not running git merge. For that, see the long answer below.

    Long

    What git rebase does is to copy (some) commits. When using the interactive rebase, git rebase -i, you can fiddle with the copying process. When using --autosquash, Git itself fiddles with the copying process. This fiddling can lead to the problem you encountered. Even without any fiddling, you can still get conflicts. Let's explore this.

    About commits

    We need to start with a brief overview of commits. Each commit:

    • has a unique number: a hash ID, formed by running a cryptographic checksum over the full content of the commit;
    • contains both a snapshot of all files (as an internal tree object holding the meat of the commit) and some metadata, or information about the commit itself: your name and email address, for instance, and the hash ID of the commit's parent or parents.

    The parent commit hash ID(s) inside each commit form commits into backwards-looking chains. For instance, if we represent a simple linear chain of commits using single uppercase letters to stand in for hash IDs, we get a drawing like this:

    ... <-F <-G <-H
    

    where H stands in for the hash ID of the last commit in the chain. That commit contains both a snapshot and the hash ID of earlier commit G. We say that H points to G. G in turn points to F, which points back still earlier.

    Because commits hold snapshots, not changes, we need to have Git compare two snapshots in order to find changes. This is like playing a game of spot the difference. To do this, we can run git diff and give it two raw commit hash IDs, or we can run git show on a single commit, which compares the commit with its (single) parent. (The effect on merge commits, which are commits with two or more parents, is trickier.)

    Because commits are found by their hash IDs, and the hash IDs are cryptographic checksums, we cannot change anything about any existing commit. If some commit is deficient in some way, the best we can do is extract it, fix it up, and put into Git a new commit: the different content will result in a new unique hash ID for the new commit. The existing commit will remain unchanged.

    Because a commit contains the hash ID of its parent, if we "change" (i.e., copy) any commit, we're forced to "change" (copy) all subsequent commits as well. So any re-ordering of a commit, or any fixing of any broken-ness about any commit—including just fixing up its log message—has a ripple effect. That's not really a big deal: most commits are pretty cheap. The fact that Git re-uses (de-duplicates) files in snapshots, and even de-duplicates entire snapshots, means that changing part of a commit—such as its log message—without changing its snapshot hardly needs any disk space at all.1 So we need not worry about disk space, normally.

    We do need to worry about other things when rebasing: in particular, we must worry about the copies of those commits that other Git repositories have. But if no other Git repository has those commits, that worry just falls aside as well. Overall, rebasing is really quite safe when we're just working with a private repository, or when we haven't sent our commits out to anyone else. Even if the whole process goes wrong, our original commits are still in Git. (It can, however, become a real chore to find the originals. When you have 47 people that look alike, and all of them claim to be Bruce, which Bruce is the original Bruce? So be sure to keep careful track, if you do this sort of thing.)


    1Any commits that are completely abandoned in this process tend to linger for at least 30 days, but are then cleaned away automatically.


    A brief look at branches

    A branch name mainly just holds the hash ID of the last commit in some chain. That is, when we have:

    ...--G--H   <-- branch1
    

    what the name branch1 is doing for us is remembering hash ID H. That way, we don't have to memorize it, or write it on a whiteboard, or whatever. If we now create a second branch name branch2, that name also points to commit H:

    ...--G--H   <-- branch1, branch2
    

    We attach the special name HEAD to one (and only one) branch name, to denote which name, and thus which commit, we're using:

    ...--G--H   <-- branch1 (HEAD), branch2
    

    Now we make some new commits. The first new commit, which we'll call I, will point back to the currently-last commit H, and Git will write I's hash ID into the name to which HEAD is attached:

              I   <-- branch1 (HEAD)
             /
    ...--G--H   <-- branch2
    

    If we make a second commit on branch1, then git checkout branch2 or git switch branch2 to attach HEAD to branch2 and make H the current commit, we get:

              I--J   <-- branch1
             /
    ...--G--H   <-- branch2 (HEAD)
    

    Making two more commits on the now-current-branch2 gives us:

              I--J   <-- branch1
             /
    ...--G--H
             \
              K--L   <-- branch2 (HEAD)
    

    Merging

    We can now use git merge. If we first git checkout branch1, J will be the current commit and we'll git merge branch2 to combine work with commit L. If we just git merge branch1, L will be the current commit and we'll combine work with commit J. The merge effect is mostly symmetric here, but the final merge commit will extend whichever branch we are actually on, so let's git checkout branch1 first:

    git checkout branch1 && git merge branch2
    

    Git will now find the best shared commit—the best of the commits that are on both branches—to act as the merge base for this merge operation. In this case, the best shared commit is obvious: it's commit H. Commit G and all earlier commits are on both branches, but H is "better" because it's closer to the end.

    To combine work, Git will now use git diff, just like we would, to find changes. Commit H has a snapshot, and commit J has one, and whatever is different between those two commits, well, that's what we did on branch1:

    git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
    

    Repeating the diff but with commit L, the other commit, this time, shows what they (well, we) changed via commits K and L:

    git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed
    

    The merge process—what I like to call merge as a verb—now combines these two sets of changes. The combined changes not only do whatever we did, but also whatever they did. If we touched a file and they didn't, we get our stuff. If they touched a file and we didn't, we get their stuff. If we both touched some file, Git will attempt to combine those changes too.

    Git will apply these combined changes to whatever is in the merge base commit. That is, suppose file F has, say, 100 lines in it, and we change something on line 42 and add a line at line 50, so that file F now has 101 lines. Suppose they change something on line 99. Git can:

    • keep our change to line 42;
    • add our line; and
    • keep their change to line 99, which is now line 100

    and everything is just fine. Git will consider this the correct result of the merge.2

    This process of combining changes and applying the combined changes to the merge base is is, again, what I call merge as a verb. This produces a set of merged files. If there are no conflicts, these merged files are ready to commit.

    The merging work actually takes place in Git's index aka staging area, though we won't go into any detail here. If there is a merge conflict, Git leaves all three input files in its index, and writes its best effort at merging into the working tree copy of the file. This working tree copy has merge conflict markers. This causes the merge-as-a-verb process to fail.

    For git merge, if the merge-as-a-verb step succeeds, Git goes on to make a merge commit. A merge commit is almost exactly the same as a regular commit: it has a snapshot, like any commit, and it has a parent, like almost any commit. But it also has a second parent. That's what makes it a merge commit. This uses the word "merge" as an adjective, and Git often refers to these commits as a merge. So this is what I all merge as a noun.

    Assuming all went well, we'd get:

              I--J
             /    \
    ...--G--H      M   <-- branch1 (HEAD)
             \    /
              K--L   <-- branch2
    

    The first parent of merge M would be commit J, because that's where branch1, our HEAD, was a moment ago. The second parent of merge M would be commit L.

    If the merge-as-a-verb process fails, git merge stops in the middle, and leaves the mess for you to clean up. It also exits with a nonzero status, for those running git merge from a script or program.


    2Whether this actually is correct is a separate question, and not one that Git really cares about. Git is just following these simple textual substitution rules.


    Copying a commit with git cherry-pick

    Now that we know how branching, branch names, and git merge work, we can look at git cherry-pick. Its function is to copy a commit, by figuring out what the commit does and "doing that again" as it were.

    That is, suppose we have a situation like this:

           I--J--K   <-- feature1
          /
    ...--H
          \
           L--M--N   <-- feature2 (HEAD)
    

    We're working on feature2 right now, and suddenly we notice: Hey, if we had commit J here after commit N, we'd be ready to finish. Ideally, we'd get someone to apply commit J to commit H—maybe on a new branch—and/or to merge commit J into something, so that we could use it more directly. But for whatever various reasons, we'd just like to get the change, from I to J, into feature2.

    We could run:

    git diff <hash-of-I> <hash-of-J>
    

    to see what the change was, and then make the same changes ourselves, to whatever we have in our commit N, and make a new commit O. But why should we knock ourselves out doing this copying, when we have a computer that can do it? We run:

    git cherry-pick <hash-of-J>
    

    and Git does the copying. If all goes well, it even copies J's commit message for us, and makes a new commit. This new commit is a lot like J—diffing N vs this new commit will show the same changes as diffing I vs J—so instead of calling the new commit O, let's call it J':

           I--J--K   <-- feature1
          /
    ...--H
          \
           L--M--N--J'  <-- feature2 (HEAD)
    

    That's all very nice, but here's what we need to know: The way git cherry-pick actually works is that it runs the Git merge machinery. It sets commit I, the parent of J, as the merge base, and then it runs those two git diff commands:

    • git diff finds what they changed; and
    • git diff finds what we changed.

    Git now combines these two sets of changes, keeping our changes to keep up with commit N, but adding their changes to get the effect of commit J too. The fact that commit I isn't even on our branch is irrelevant. Git uses the merge machinery to make this copy—and usually all works splendidly.

    Having run the merge-as-a-verb process, Git goes on to make a regular ordinary single-parent commit. That's our J'. The commit's author, author-date, and log message get copied from commit J; we become the committer, and the commit date on the new commit is "now".

    But: the merge-as-a-verb process can fail. It can have a merge conflict. This is what you are seeing with your --autosquash rebase.

    Rebasing without fixups or other tricks

    We're almost ready to put the pieces together. We just need to know one more thing: git rebase works by copying commits, as if by using git cherry-pick. For some versions of git rebase, Git literally runs git cherry-pick. The most modern version of Git as of today has the cherry-picking built in to the rebase code, so that it doesn't have to run it separately, but the effect is the same. We can think of it as cherry-picking. Even the fixup and squash cases do this: they just alter the final make-a-new-commit step.

    To accomplish a rebase, Git first lists out the commit hash IDs of all the commits that are to be copied. This listing-out process is considerably more complicated than it looks at first, but we get to ignore all the complications here, as none of them actually apply. In your case, you have four commits to worry about, and three of them will be copied, so let's draw that. We'll name the first one A. It's a root commit: a slightly special case, a commit with no parent. So, here is what you have:

    A--B--C--D   <-- master (HEAD)
    

    To do the git rebase -i—whether or not there's any autosquash going on—Git first lists out each of the commits to copy. Using HEAD^^^, you tell Git that the commits not to copy start at A and work backwards. The commits it should copy are those starting from HEAD (i.e., master) and working backwards: D, C, B, and A. From that list, we throw out A-and-back, leaving D, C, and B.

    Normally Git would copy these three in the B-C-D order. That would work. Git would copy B to a new and improved commit B', then copy C using B' as C's parent, then copy D using C', to produce:

      B'-C'-D'  <-- master (HEAD)
     /
    A--B--C--D   [abandoned]
    

    Each of these copy steps works as if by using git cherry-pick, using Git's detached HEAD mode. Git first checks out commit A using --detach:

    A   <-- HEAD
     \
      B--C--D   <-- master
    

    and now runs git cherry-pick with the hash of commit B.3 This copies B to B' using the merge engine, setting the "merge base" to commit A. Git compares commit A, the merge base, to itself because HEAD says to use A. This says not to change anything. Then Git compares commit A (merge base again) to commit B. This says to make the changes that result in commit B's snapshot. Git makes the changes that result in commit B's snapshot, and commits these as a regular (non-merge) commit B', re-using most of B's metadata:

    A--B'  <-- HEAD
     \
      B--C--D   <-- master
    

    Now Git cherry-picks commit C. Commit B is the parent of C, so it is the forced merge base. It exactly matches our HEAD commit B', so we have no changes to merge in; we pick up their changes and commit, resulting in an exact copy of C as C':

    A--B'-C'  <-- HEAD
     \
      B--C--D   <-- master
    

    We repeat with D to get D', and then rebase does its last step, which is to yank the name master off commit D and paste it onto the last commit just made, and re-attach HEAD:

    A--B'-C'-D'  <-- master (HEAD)
     \
      B--C--D   [abandoned]
    

    which is the same picture we drew before, just drawn a little differently.


    3The rebase command is actually clever here: it realizes that copying B here, at commit A, produces a new commit that's really an exact copy of B except for the date-and-time-stamps. So instead of copying it, it just re-uses it in place. To defeat the cleverness—which is sometimes useful, on the rare occasion when you need new hash IDs—you can force git rebase to make copies anyway. For illustration purposes, we'll just pretend that git rebase is dumber, or that you have defeated the cleverness, but if you delve deep into rebase, know that it does do this.


    Squash or fixup

    We can, if we choose, tell git rebase -i to squash a commit into a previous commit, during this copying process. We just replace the word pick with the word squash in the instruction sheet that git rebase -i gives us to edit. Suppose we did that with commit C, for instance. Then after copying B to B', so that we have:

    A--B'  <-- HEAD
     \
      B--C--D   <-- master
    

    Git will do the git cherry-pick in mostly the same way as before, resulting in a situation in which the next would be C' as we showed before. But instead of just committing as normal, this commit step takes two special actions:

    1. It writes the commit message from B (or B'—they're the same) into a temporary file, and adds the commit message from C. It adds a little text about this being a squash of two commits, as well. That's what you see in your editor when Git fires up your editor before actually writing out the new commit.

    2. Instead of committing as usual, so that C' has B' as its parent, Git instructs the commit process to make the next commit have A as its parent.

    The result at this point is:

      B'   [abandoned]
     /
    A--BC   <-- HEAD
     \
      B--C--D   <-- master
    

    where BC has a snapshot that matches commit C, but the commit message you provide when you edit the file.

    Rebase can then go on to cherry-pick D as usual, and move the branch name as usual. If you can't see the abandoned commits—including the abandoned B'—then it might as well not exist,4 and you just have:

    A--BC--D   <-- master (HEAD)
    

    and we don't really need to draw in the other abandoned commits either.

    Note that if you use fixup instead of squash in your command-sheet, Git still does this squashing process. It just doesn't bother having you edit a new commit message. Instead of gathering together the commit messages from each of the various to-be-squashed commits/copies, it just drops the fixup's message entirely, keeping the previous commit's message. (You can combine fixups and squashes: if you have $S squashes and $F fixups, the combined message you edit will hold all $S messages, and none of the $F messages.)


    4Due to rebase cleverness, it might actually not exist. This process works even when rebase just re-uses commit B directly.


    But why do we get a conflict?

    You added --autosquash. That makes git rebase automatically move the copying commands around (and then replace some with squash or fixup as well). Commit B remains in place, but commit D, which is its fixup, moves to just after B. Commit C is left at the end. Git is now doing to:

    • copy B normally; then
    • copy D, as a squash-with-fixup, i.e., discard D's message when we make BD as a new commit; then
    • copy C normally.

    So let's look at what we get when we're copying D. We have:

    A--B'  <-- HEAD
     \
      B--C--D   <-- master
    

    just as we did before. Now we run git cherry-pick on commit D. This uses commit C as the merge base. We get, as our changes, a diff from C to B'.

    The diff from C to B' says to remove the line line four from the merge base copy of the file; this line should be the third line. Meanwhile, the diff from C to D says to replace the line line four in the merge base copy of the file, so that it reads line three instead. In both cases this comes after the line line two.

    In the actual file in commit B', there is no line after line 2, which reads line two. Git doesn't know how to change it from reading line four to reading line three, and also to remove it, because it's simply not there. Git does the best it can with this file. It then fails the merge-as-a-verb process, stopping the rebase process in its tracks, and tells you to fix the mess.

    If you set merge.conflictStyle to diff3,5 your working tree copy of the file will contain not just the two conflicting changes that Git couldn't combine for some reason, but also the merge base version of the lines. In this case, that will only help slightly, but that might be enough. I like to have diff3 set.

    Once you fix the conflict—however you choose to fix it—Git takes your result as "the right answer" and makes the new BD combined commit, using whatever you told Git was the right way that the file should read. So now you have:

      B'   [abandoned]
     /
    A--BD   <-- HEAD
     \
      B--C--D   <-- master
    

    Git is now supposed to cherry-pick commit C. This runs a merge with the merge base set to commit B. Our commit is BD, so "what we changed" is a diff of the B copy of the file with whatever you've done. Their commit is C, so "what they changed" is the diff from B to C, which says to add the line "line four" at line 3, between a line that says "line two" (at line 2) and the end of the file.

    Unless you make the file end after two lines, with the second line reading "line two", Git is likely to have a problem combining "their" change with yours. So you will see a merge conflict. If you do make the file end there like that, Git will decide that the merge requires nothing at all, which will leave git rebase a bit puzzled: it will tell you that it seems as though there's no reason to cherry-pick commit C any more, and force you to choose whether to use git rebase --skip to skip it.


    5Use git config. To set it for all your repositories that don't already have it set, use git config --global. I use git config --global merge.conflictStyle diff3 to set it globally.