Search code examples
gitgit-revert

How to get rid of mistakes in a merge commit and keep the right parts?


Someone not familiar with git committed on his branch, and then made a merge commit with the develop branch. When merging, he:

  1. resolved the conflicts by rewriting them completely
  2. made change on several files that could be merged without conflicts
  3. discarded other changes which should have been auto merged

Now I want to keep the part in 1 and 2, but revert the 3rd one, what should I do? Noted that his branch has been pushed to the remote so I hope that reset can be avoided.

What I have tried:

  1. git revert <commit-id> -m 1 and get back to the commit before merging
  2. Try merging again but get told 'Already up to date.' and the changes discarded are still gone.

What I was expecting here should be the same as git reset head^; git merge develop but it seems that I do not understand revert correctly.


Solution

  • There is no right answer to this particular problem. There are only answers that leave a few problems, and answers that leave many problems. The badness of each of these problems depends on your particular situation:

    • For instance, using git reset to strip the merge, followed by a git push --force, creates problems for anyone else using the remote clone. But perhaps only one other person is using that clone, and that one other person already knows what to do, or can be instructed as to what to do.

      In this case, the "badness" of stripping the bad merge and starting over is relatively small, especially since you can keep the good resolutions (although this requires manual work and a lot of Git knowledge). Once you're done, nobody ever has to deal with the bad merge again, which leaves things in a nice state.

    • But perhaps many people are using that remote repository, and stripping out the bad merge would cause irreparable damage. In that case, the "badness" of stripping the bad merge is enormous, and you should use another strategy.

    The main thing to remember is that a Git repository is, in the end, nothing more or less than a collection of commits. The commits in the repository are the history and are the repository.1 So, whatever you end up doing, you will add commits to the repository. To fix a bad merge commit, you must add more commits.

    These need not be merge commits. You can leave the existing merge in place, and simply remember it (or mark it—see git notes) as "bad, do not use". You can then add ordinary (non-merge) commits that fix the problem.

    Each commit stores a full snapshot of every file. Commits do not contain differences from a previous commit. So a bad merge commit is simply a commit with some files having the wrong contents. A subsequent non-merge commit can store files with the right contents.

    Your problem thus boils down to two parts:

    • You must decide whether or not to remove the bad merge. This is a value judgment, with no right answer.

    • You must come up with the corrected contents. This is a mechanical problem: how will you produce the correct files? Here, Git can help.

    Let me get a footnote out of the way, and then describe how Git can help.


    1This is a mild overstatement: there may be git notes, although technically those are stored in commits anyway, and tags; and humans attach significance to branch names, which are also in the repository, but are rather ephemeral and should not be depended-on quite so heavily.


    How Git performs a true merge

    A true merge, in Git, is an operation on three input commits.2 The three commits include your current commit, as selected by your current branch name and the special name HEAD. You give Git another commit on the command line: when you run git merge other-branch-name or git merge hash-id, Git uses this to locate the other branch tip commit. For much more on how branch tips work, and how HEAD works, see Think Like (a) Git. This site will also help understand the next part.

    Given these two branch tip commits, Git now finds the third—or in some sense, first—of the three input commits on its own, using the commit graph. Each ordinary, non-merge commit connects, backwards, to some earlier commit. This series of backwards connections must eventually arrive at some common starting point, where the two branches last shared some particular commit.

    We can draw this situation like this:

              I--J   <-- our-branch (HEAD)
             /
    ...--G--H
             \
              K--L   <-- their-branch
    

    Our latest commit, which I've drawn as commit J, points backwards to some earlier commit(s), which I've drawn as commit I. Their latest commit L points backwards to some earlier commit K. But then I and K point backwards to some commit—here, H—that's on both branches at the same time. Think Like (a) Git has a lot more about how this works, but for our purposes here, we need only see that Git finds commit H on its own, and that it's on both branches.

    When we run git merge with commit J as our commit—which Git calls --ours or HEAD or the local commit—and commit L as their commit—Git calls this it either --theirs, or the remote commit, typically—Git finds commit H as the merge base. Then it:

    1. Compares the snapshot in commit H to the snapshot in our commit J. This finds out what files we changed, and what changes we made to those files.

    2. Compares the snapshot in H to the one in L. This finds out what files they changed, and what changes they made to those files.

    3. Combines the changes. This is the hard-work part. Git does this combining using simple text-substitution rules: it has no idea which changes really should be used. Where the rules allow, Git makes these changes on its own; where the rules claim that there is a conflict, Git passes the conflict on to us, for us to fix. In any case, Git applies the combined changes to the snapshot in the starting commit: merge base H. That keeps our changes while adding theirs.

    So, if the merge goes well on its own, Git will make a new merge commit M, like so:

              I--J
             /    \
    ...--G--H      M   <-- our-branch (HEAD)
             \    /
              K--L   <-- their-branch
    

    New commit M has a snapshot, like any commit, and a log message and author and so on just like any commit. The only thing that's special about M is that it links back not just to commit J—our commit when we started—but also to commit L, the commit whose hash ID we told git merge about (either using the raw hash ID, or using the name their-branch).

    If we have to fix up the merge ourselves, we do that and run git add and then either git commit or git merge --continue, to make merge commit M. When we do this, we have full control over what goes into M.


    2This is the kind of merge that results in a merge commit, i.e., a commit with two parents. Git can also perform what it calls a fast-forward merge, which is not a merge at all and produces no new commit, or what it calls an octopus merge, which takes more than three input commits. Octopus merges have certain restrictions, which means they do not apply to this case. True merges can involve making a recursive merge, which complicates the picture as well, but I'm going to ignore this case here: the complications are not directly relevant to what we'll be doing.


    Redoing the bad merge

    Our situation here is that we started with:

              I--J   <-- our-branch (HEAD)
             /
    ...--G--H
             \
              K--L   <-- their-branch
    

    Then someone—presumably not us 😀—ran git merge their-branch or equivalent, got merge conflicts, and resolved them incorrectly and committed:

              I--J
             /    \
    ...--G--H      M   <-- our-branch (HEAD)
             \    /
              K--L   <-- their-branch
    

    To re-perform the merge, we just need to check out / switch to commit J:

    git checkout -b repair <hash-of-J>
    

    for instance, or:

    git switch -c repair <hash-of-J>
    

    to use the new (since Git 2.23) git switch command. Then we run:

    git merge <hash-of-L>
    

    To get the two hash IDs, we can use git rev-parse on merge commit M, with the funky ^1 and ^2 syntax suffixes; or we can run git log --graph or similar and find the two commits and see their hash IDs directly. Or, if the name their-branch still finds commit L, we can run git merge their-branch. Git just needs to locate the correct commit.

    Git will, at this point, repeat the merge attempt it tried earlier, following exactly the same rules. This will produce exactly the same conflicts. Our job is now to fix up these conflicts, but this time, we do it correctly.

    If we like the resolution that someone else made in commit M, we can ask git checkout (all versions of Git) or git restore (Git 2.23 and later) to extract the resolved file that the other person put in commit M:

    git checkout <hash-of-M> -- <path/to/file>
    

    for instance. Even if we don't like the entire resolution, we can still do that and then fix up the file and run git add; only if we don't like any of the resolution, and want to do the entire fixing-up ourselves, do we have to do the entire fixing-up ourselves.

    One way or another, though, we just fix up each file and git add the result to tell Git that we have fixed up the file. (The git checkout hash -- path trick makes it so we can skip the git add step in some cases, but it won't hurt to run git add anyway either.) When we're all done, we run git merge --continue or git commit to finish this merge: the result is a new merge commit M2 or N, on our new branch repair or whatever we called it when we created it:

              I--J-----M2   <-- repair (HEAD)
             /    \   /
    ...--G--H      M /  <-- our-branch
             \    /_/
              K--L   <-- their-branch
    

    We can now git checkout our-branch, which lands us on commit M, and grab files directly from repair:

    git checkout our-branch
    git checkout repair -- path/to/file1
    git checkout repair -- path/to/file2
    ...
    

    and then we're ready to git commit to make a new commit N. Or, we can en-masse grab every file from M2:

    git checkout repair -- .
    

    and run git status, git diff --cached, and/or git commit at this point, depending on how sure we are we got this all right.

    The result of the above is:

              I--J-----M2   <-- repair
             /    \   /
    ...--G--H      M-/--N   <-- our-branch (HEAD)
             \    /_/
              K--L   <-- their-branch
    

    and we can now delete branch name repair entirely: commit N is just "magically fixed".

    If we intend to keep commit M2, we can use git merge to merge repair into M. We might want to run git merge --no-commit so that we gain full control: this will stop git merge from making the actual commit yet, so that we can inspect the snapshot that's about to go in to the new merge. Then the final git merge --continue or git commit makes N as a new merge commit:

              I--J-----M2   <-- repair
             /    \   /  \
    ...--G--H      M-/----N   <-- our-branch (HEAD)
             \    /_/
              K--L   <-- their-branch
    

    and once again we can delete the name repair; it no longer adds anything of value.

    (I'd generally just make a simple non-merge fixup commit myself, rather than another merge. The merge base for making N as a merge is both commits J and L, which means Git will do a recursive merge unless we specify -s resolve. Recursive merges tend to be messy and have weird conflicts sometimes.)

    If there have been commits since the bad merge

    Commits that occur after bad-merge-M just need their changes carried forward into what I have drawn above as final commit N. How you go about achieving that is not really terribly important, though some ways may have Git do more of the work for you. The thing to remember here is what I said earlier: in the end, it's the commits in the repository that matter. That includes both the graph—the backwards-looking connections from commit to earlier commit—and the snapshots. The graph matters to Git itself, as it is how git log works and how git merge finds the merge base. The snapshots matter to you, as they are how Git stores the content that you care about.