Search code examples
gitmerge-conflict-resolutionrevertgit-revert

Merge conflicts from git revert - Should I accept current change or incoming and why?


I have commits like so - A <- B <- C <- D <- E <- Head

I'm using git revert --no-commit [git hash] to undo specific commits in between commits I want to keep. Say I want to revert D and B.

Based on this post, the right way to revert is to start with the most recent commit you want to revert - E.g,

git revert --no-commit D
git revert --no-commit B
git commit

I'm getting a merge conflict and I'm not sure whether I should accept the current change or incoming change since this is essentially going backwards.


Solution

  • TL;DR

    In general, you're going to have to think about the result. You don't want to blindly accept "ours" as that will keep the commit you're trying to undo. You don't want to blindly take "theirs" as that almost certainly will eradicate one of, or part of, the other commits you wanted to keep. Overall, you might generally favor "theirs"—but thinking will be required. To see why, read on.

    Long

    This is a small point, not directly relevant to your question and its answer, but worth mentioning: Git, internally, works backwards (because it must).1 Hence commits link backwards rather than forwards. The actual link, from a later commit to an earlier one, is part of the later commit. So your drawing would be more accurate like this:

    A <-B <-C <-D <-E   <-- main (HEAD)
    

    (assuming you're on branch main, so that the name main selects commit E). But I usually get lazy about this and draw connecting lines, because it's easier and because the arrow fonts with diagonal arrows don't come out very well, while \ and / for slanting connecting lines work fine.

    In any case, the reason to do the revert "backwards" is that if we want to undo the effect of commit E, and run git revert E to make commit Ǝ:

    A--B--C--D--E--Ǝ   <-- main (HEAD)
    

    the resulting source snapshot, in commit Ǝ, will exactly match the source snapshot in commit D. That means we can now run git revert D and get a commit that "undoes" the effect of D, too, without ever seeing any merge conflicts. The resulting snapshot matches that in C, making it trivial to revert C, resulting in a snapshot that matches B, and so on.

    In other words, by reverting in reverse order, we make sure we never have any conflicts. With no conflicts, our job is easier.

    If we're going to pick and choose specific commits to revert, this strategy of avoiding conflicts falls apart, and there may be no strong reason to revert in reverse order. Using reverse order might still be good—if it results in fewer conflicts, for instance—or it might be neutral or even bad (if it results in more/worse conflicts, though this is unlikely in most realistic scenarios).

    With that out of the way, let's get to your question ... well, almost to your question. Both cherry-pick and revert are implemented as a three-way merge operation. To understand this properly, we need to look at how Git does a three-way merge in the first place, and why it works (and when it works, and what a conflict means).


    1The reason that this is necessary is that no part of any commit can ever be changed, not even by Git itself. Since the earlier commit is set in stone once it's made, there's no way to reach back into it and make it link to the later one.


    A standard git merge

    Our usual simple merge case looks like this:

              I--J   <-- branch1 (HEAD)
             /
    ...--G--H
             \
              K--L   <-- branch2
    

    Here we have two branches that share commits up through and including commit H, but then diverge. Commits I and J are only on branch1, while K-L are only on branch2 for now.

    We know that each commit holds a full snapshot—not a set of changes, but a snapshot—with the files compressed and de-duplicated and otherwise Git-ified. But each commit represents some change: by comparing the snapshot in H to that in I, for instance, we can see that whoever made commit I fixed the spelling of a word in the README file, on line 17, for instance.

    All of this means that to see changes, Git always has to compare two commits.2 Given this reality, it's easy to see that Git can figure out what we changed on branch1 by comparing the best shared commit, commit H, to our last commit, commit J. Whatever files are different here, with whatever changes we made, those are our changes.

    Meanwhile, the goal of a merge is to combine changes. So Git should run this diff—this comparison of two commits—to see our changes, but also should run a similar diff to see their changes. To see what they changed, Git should start from the same best shared commit H and diff that against their last commit L:

    git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
    git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed
    

    Git will now combine these two sets of changes: if we changed the README file and they didn't, that means use our version of the README file. If they changed some file and we didn't, that means use their version of that file. If we both touched the same file, Git has to figure out how to combine those changes, and if nobody touched some file—if all three versions match—Git can just take any of those three versions.

    These give Git a bunch of short-cuts. The slow and simple way to combine our changes is to extract all the files from H itself, apply our and their changes where they don't conflict, and apply the conflicting changes with conflict markers where they do conflict. What Git really does has this same effect. If there aren't any conflicts, the resulting files are all ready to go into a new merge commit M:

              I--J
             /    \
    ...--G--H      M   <-- branch1 (HEAD)
             \    /
              K--L   <-- branch2
    

    The new commit becomes the last commit for branch1. It links back to commit J, the way any new commit would, but it also links back to commit L, the commit that is still currently the last commit of branch2.

    Now all the commits are on branch1 (including the new one). Commits K-L, which used to be only on branch2, are now on branch1 as well. This means that in a future merge, the best shared commit is going to be commit L, rather than commit H. We won't have to repeat the same merge work.

    Note that commit M contains the final merged results: a simple snapshot of all files, with the correctly-merged contents. Commit M is special in only one way: instead of one parent J, it has two parents, J and L.

    If there are conflicts, though, Git makes you—the programmer—fix them. You edit the files in your working tree, and/or access the three input copies that Git had—from commits H, J, and L respectively—and combine the files to produce the correct result. Whatever that correct result is, you run git add to put that into the future snapshot. When you are done with this, you run:

    git merge --continue
    

    or:

    git commit
    

    (merge --continue just makes sure there's a merge to finish, then runs git commit for you, so the effect is the same). This makes commit M, with the snapshot you provided when you resolved all the conflicts. Note that in the end, there's nothing different about a resolved-conflict merge vs a Git-made, no-conflict merge: it's still just a snapshot of files. The only thing special about this conflicted merge is that Git had to stop and get your help to come up with that snapshot.


    2Git can also compare one commit's snapshot to some set of ordinary files stored outside of any commit, or two sets of files both of which are outside commits, or whatever. But mostly we'll be working with files-in-commits, here.


    Copying the effect of a commit with cherry-pick

    We now take a side trip through the cherry-pick command, whose goal is to copy the changes of a commit (and the commit message) to some different commit (with different hash ID, often on a different branch):

            (the cherry)
                  |
                  v
    ...--o--o--P--C--o--...   <-- somebranch
          \
           E--F--G--H   <-- our-branch (HEAD)
    

    Here, we are on some commit with some hash H, at the tip of our branch, and are about to do some work when we realize: Hey, I saw Bob fix this bug yesterday / last-week / whenever. We realize that we don't have to do any work: we can just copy Bob's fix, in a "cherry" commit C. So we run:

    git cherry-pick <hash-of-C>
    

    For Git to do its job, Git has to compare the parent of C, commit P, to commit C. That's a job for git diff of course. So Git runs git diff (with the usual --find-renames and so on) to see what Bob changed.

    Now, Git needs to apply that change to our commit H. But: what if the file(s) that need fixing, in commit H, have a bunch of unrelated changes that skew the line numbers? Git needs to find where those changes moved to.

    There are a lot of ways to do that, but there's one way that works pretty well every time: Git can run a git diff to compare the snapshot in P—the parent of our cherry—to the snapshot in our commit H. That will find any differences in the files that are different between H and the P-C pair, including long stretches of inserted or deleted code that move the places where Bob's fix needs to go.

    This is of course going to turn up a bunch of irrelevant changes too, where P-vs-H is different just because they're on different lines of development. We started from some shared (but uninteresting) commit o; they made a bunch of changes—and commits—leading to P; we made a bunch of changes and commits, E and F and G, leading to our commit H. But: so what? Given that git merge is going to take our files where there's no conflict at all, we'll just get our files from H. And, given that, where both "we" and "they" changed some files, Git will "keep our changes" from P to H, then add their changes from P to C, that will pick up Bob's changes.

    So this is the key realization: if we run the merge machinery, the only place we'll get conflicts is where Bob's changes don't fit in. Therefore, we do run the merge machinery:

    git diff --find-renames <hash-of-P> <hash-of-H>   # what we changed
    git diff --find-renames <hash-of-P> <hash-of-C>   # what Bob changed
    

    and then we have Git combine these changes, applying them to the "common" or "merge base" commit P. The fact that it isn't common to both branches does not matter. We get the right result, which is all that does matter.

    When we're done "combining" these changes (getting our own files back, for files that Bob didn't touch, and applying Bob's changes, for files that Bob did touch), we have Git make a new commit on its own, if all went well. This new commit isn't a merge commit though. It's just a regular, ordinary, everyday commit, with the usual parent:

    ...--o--o--P--C--o--...   <-- somebranch
          \
           E--F--G--H--I   <-- our-branch (HEAD)
    

    The git diff from H to I introduces the same changes as the git diff from P to C. The line numbers might be moved about if necessary, and if so, the moving-about happened automatically using the merge machinery. Also, new commit I re-uses the commit message from commit C (though we can modify it with git cherry-pick --edit, for instance).

    What if there are conflicts? Well, think about this: if there is a conflict in some file F, that means that Bob's fix to F affects some lines in that file that are different in their parent P and in our commit H. Why are these lines different? Either we don't have something we might need—maybe there's some commit before C that has some key setup code we need—or there's something we do have, that we don't want to lose. So it's rarely correct to just accept ours, because then we don't get Bob's fix to the file. But it's rarely correct to just accept theirs either, because then we're missing something, or we lose something we had.

    Reverting is backwards cherry-picking

    Suppose instead of this:

    ...--o--o--P--C--o--...   <-- somebranch
          \
           E--F--G--H   <-- our-branch (HEAD)
    

    we have this:

    ...--o--o--P--C--D--...   <-- somebranch
                      \
                       E--F--G--H   <-- our-branch (HEAD)
    

    Commit C, perhaps still made by Bob, has a bug in it, and the way to get rid of the bug is to undo the entire change from commit C.

    What we'd like to do, in effect, is diff C vs P—the same diff we did earlier for our cherry-pick, but backwards. Now, instead of add some lines here to add some feature (that's actually a bug), we get remove those same lines here (which removes the bug).

    We now want Git to apply this "backwards diff" to our commit H. But, as before, maybe the line numbers are off. If you suspect that the merge machinery is an answer here, you're right.

    What we do is a simple trick: we pick commit C as the "parent", or the fake merge base. Commit H, our current commit, is the --ours or HEAD commit as always, and commit P, the parent of commit C, is the other or --theirs commit. We run the same two diffs, but with slightly different hash IDs this time:

    git diff --find-renames <hash-of-C> <hash-of-H>   # what we changed
    git diff --find-renames <hash-of-C> <hash-of-P>   # "undo Bob's changes"
    

    and we have the merge machinery combine these, as before. This time the merge base is commit C, the commit we're "undoing".

    As with any merge, including that from cherry-pick, any conflicts here have to be considered carefully. "Their" change is something that backs out commit C, while "our" change is something that's different between P—what they are starting with when they back this out—and our commit H. There is no royal short-cut here, no -X ours or -X theirs, that will always be right. You'll just have to think about this.

    Be careful with -n: consider not using it

    If you're getting conflicts when using git cherry-pick or git revert, you must resolve them. If you're not using -n, you resolve them and then commit. If you are doing this with multiple commits, your next operation might get a conflict too.

    If you committed, the next cherry-pick or revert starts with your commit as the HEAD version. If you got something wrong in any of the intermediate versions, that alone might cause a conflict; or, there might be a conflict here that would arise no matter what. As long as you resolve this one and commit too, you leave a trail. You can go back and look at each individual cherry-pick or revert and see if you did it correctly, or not.

    Now, you can use git cherry-pick -n or git revert -n to skip the commit at the end. If you do that, the next cherry-pick or revert uses your working tree files as if they were the HEAD-commit versions. This works the same way as before, but this time, you do not leave a trail. If something goes wrong, you can't look back at your previous work and see where it went wrong.

    If you leave off the -n, you'll get a whole series of commits:

    A--B--C--D--E--Ↄ   <-- main (HEAD)
    

    for instance, after reverting C. If you then go to revert A and it all goes well, you might get:

    A--B--C--D--E--Ↄ--∀   <-- main (HEAD)
    

    If you now say "that's nice but I don't really want in the mix", it's easy to get rid of it while keeping its effect, using git rebase -i or git reset --soft. For instance, a git reset --soft with the hash ID of commit E results in:

                  Ↄ--∀   ???
                 /
    A--B--C--D--E   <-- main (HEAD)
    

    but leaves Git's index and your working tree full of the files that make up the contents of commit . So you can now run git commit and get a new commit:

                  Ↄ--∀   ???
                 /
    A--B--C--D--E--Ↄ∀   <-- main (HEAD)
    

    where Ↄ∀ is the effect of combining (i.e., squashing) and .

    If nothing went wrong, you will have to do this squashing, but if something did go wrong, you don't have to start from scratch.