Search code examples
gitmergerebasemerge-conflict-resolution

How can merge conflicts remain unresolved in commit history?


I was recently browsing through a project's commit history by:

  1. git rebase -i --root
  2. selecting every commit for editing (but only to view them; not edit them)
  3. git rebase --continue for each commit after viewing and nothing else

After continuing from one of the commits, a merge conflict occurred. I thought that when merge conflicts occur, they need to be resolved (via editing or skipping) before the merge can be applied. However, this was already part of the project's commit history which goes well past the commit I'm currently on, so I assume some sort of merge did happen.

How did the commits go past this point without resolving the conflict? How and when can this sort of thing happen?


Solution

  • I was recently browsing through a project's commit history [using git rebase]

    What git rebase does is copy commits (to new and, presumably, improved ones). Commits are history, so by copying commits, you're creating new (and presumably improved) history.

    After continuing from one of the commits, a merge conflict occurred.

    This is normal enough: it means that the attempt to copy the next commit, at this point in your new-and-improved series of commits, was unable to proceed because of the merge conflict.

    I thought that when merge conflicts occur, they need to be resolved (via editing or skipping) before the merge can be applied.

    They do. Note, however, that as mousetail said in a comment, all you really have to do is mark the conflicts as taken-care-of. Git believes you, even if you didn't do anything here. Whatever you put into Git's index as the resolved file, Git assumes that's the correct merge result.

    However, this was already part of the project's commit history which goes well past the commit I'm currently on, so I assume some sort of merge did happen.

    Maybe, and maybe not. This depends on how you rearranged the old commits to make your new-and-improved commits.

    If this particular merge conflict did occur before, you could use the previous merge's resolved files to resolve your new-and-improved commits' merge conflicts. That generally involves using either git checkout (pre-Git-2.23) or git restore (Git 2.23 or later) to extract the files of interest from the original merge, though this requires care depending on what, precisely, the improvements you are making might be.

    In this particular case, however, it's most likely that these commits are a result of the default method by which git rebase deals with copying merge commits: by not copying them at all. That is, if we have a history that goes something like this:

         C--D
        /    \
    A--B      G--H   <-- main
        \    /
         E--F
    

    (for a total of eight commits), and we run:

    git rebase -i --root
    

    we will get an instruction sheet with seven, not eight, pick commands. These will be for commits A and B in that order, then C, D, E, and F in some order—rebase uses --topo-order so we can predict that C and D will be grouped together in that order, and E and F together in that order, but this could be C-D-E-F or E-F-C-D—and then commit H. Merge commit G is simply omitted entirely.

    If merge commit G had to resolve conflicts, this is almost certainly going to have at least one conflict when picking at least one of C-D or E-F, whichever comes later in the instruction sheet. This will in fact be the same kind of conflict that required resolution in G. But we can get a conflict here even if G did not have one: the details are rather complicated.

    In any case, the final result, after doing the full rebase, will look a lot like this, or perhaps with the C-D pair later:

    A--B--C--D--E--F--H   <-- main
    

    But because some of these commits are different from their originals, they will have different hash IDs. (To notice this, you'll need to carefully note the hash IDs before the rebase, then compare them to the hash IDs after the rebase.) If the above is the result, and C and D did not need any changing due to the cherry-picking,1 we'll have:

    A--B--C--D--E'-F'-H'  <-- main
    

    The snapshot for E' is likely different from that for E; the parent linkage for E' goes to D, not to B; so E' is necessarily a different—new and improved—commit.

    Using -f (or --force-rebase or --no-ff, all of which are just alternate spellings) during git rebase forces Git to copy even those commits that could be re-used intact. That is, the result would be:

    A'-B'-C'-D'-E'-F'-H'  <-- main
    

    The rebase command recently learned the ability to "copy" merge commits using git rebase --rebase-merges (shorthand: git rebase -r). This puts up a considerably more complicated instruction sheet, containing labels and "go-to" constructs and merge directives. It works by re-performing the merges, since merges cannot be copied the way git cherry-pick can copy non-merge commits. With this option you would be able to preserve the graph structure, rather than flattening away merges.


    1This is where that -f option comes in, really. A literal cherry-pick would always produce a new commit, with a new date-and-time-stamp and hence a new hash ID. Rebase cleverly figures out whether a new commit is needed and, if not, fast-forwards the old commit into place instead of running cherry-pick. Using -f prevents this cleverness, which is useful when forcibly re-copying a topic branch for a later re-merge.


    What happens to the old commits?

    If you've replaced some commits, so that your new main points to new-and-improved commit H' instead of old-and-lousy H ... what happens to the old commits?

    The answer is: nothing happens to them. They're still there, in your Git repository. You can find them under the name ORIG_HEAD, at least until something updates that name. Or, you can find them using your branch's reflog, or the HEAD reflog, at least until those reflog entries expire. (The other way to find them is if you have their hash IDs memorized. But I'd bet you don't.)

    Once the reflog entries expire and there's no longer a way to find the original commits, they will be collected by the Grim Reaper ... er, Grim Collector er uhm Garbage Collector, git gc. In a normal repository, these commits get a minimum grace period of 30 days before they can be collected.

    What you should do instead

    Instead of using git rebase to view historical commits, just check them out, one by one. To get a list of all commit hash IDs, use git rev-list, e.g.:

    git rev-list HEAD > /tmp/allrevs
    

    Then you can, e.g., use the sh/bash construct:

    exec 3< /tmp/allrevs
    while read hash; do
        git switch --detach --quiet $hash
        echo viewing $hash
        sh 3<&-
    done <&3
    exec 3<&-
    

    This won't have any re-merge issues, although getting out of it early is annoying (requires a very fast Ctrl+C after exiting the subshell). To fix that, do something clever after the sh line. Note: the exec 3< is to handle very large hash lists; for smaller lists, the less-annoying:

    for hash in $(cat /tmp/allrevs); do ... done
    

    construct works fine and does not need the weird redirection tricks.