Search code examples
gitgithubgit-mergegit-branchgit-merge-conflict

How Git manages an automatic solved merge?


I studied what is the git merge operation and what it does when it discovers a conflict that is not able to solve automatically.
If I can solve manually the conflicts I can choose what I want to save and what I want to change.

On the other hand we have the fast forward merge, if one branch is direct ancestor of the other, and on the other hand the not fast forward merge but automatically solved.
Here I find difficul to understand how Git treats these two cases: I've seen it selects automatically what to change but I how can I know if it's doing the things as I want?

For example on the test branch I worked with file.txt while on master branch I have another version of file.txt.
The two branches share a common ancestor.
I execute git checkout master and then I want to merge with test.
To do so I digit git merge test. Then what could happen?

  1. master has a completely different content
  2. master has text that wasn't present inside the test version of file.txt
  3. master has fewer pieces of text than the file.txt inside test

My question concerns a generic case: How can I understand, beforehand run git merge test, how Git will treats these merges?
Maybe it depends on which branch I'm currently in when I start git merge?


Solution

  • Let's see if I can cover everything in a short-ish post:

    • Git has multiple merge strategies. When you run git merge you may choose a strategy, e.g., git merge -s resolve other or git merge -s octopus br1 br2 br3. The standard strategies are ours, recursive, resolve, subtree, octopus, and now the new ort.

    • Virtually all the real work is done by the strategy. So before you can decide how a merge will operate, you must know which strategy you will be using.

    • The default strategy for most merges was recursive and may soon become ort. These two are mostly intended to work the same, except that ort should be much faster and handle a few of the tricky cases better. (Note: that's the goal state, not the current state, which is why it's not yet the default.) If, however, you give multiple "heads" (commits, really) to git merge, the default is octopus.

    Except for the ours strategy (which does not need a merge base, and I think does not bother to compute one) and the octopus strategy (which uses an alternative merge-base computation), these merges must find the (singular) merge base commit. To find that commit, Git uses a Lowest Common Ancestor algorithm as extended to DAGs. You can run this manually:

    git merge-base --all HEAD otherbranch
    

    for instance. But as the presence of an "all" option implies, and the Wikipedia link makes clear, the output from this algorithm may be more than one commit.

    If there is only one merge base, all is well. If not, each strategy must do something about this. (The octopus strategy does whatever it does since it's not using this algorithm in the first place; I've never delved all the way to the bottom of that question, as I am wary of bugs Balrogs.) The resolve strategy uses a straightforward but terrible answer: it picks one at (apparent) random and uses that. The default recursive strategy, however, simply merges the merge bases (not using the octopus algorithm, but rather using a slightly Balrog-ridden recursive approach that ort attempts to improve; I wait to see the results...).

    Skipping some recursive merge details (but noting that this is what the "recursive" merge driver entry is about), we move on: the subtree strategy is really just the recursive algorithm in disguise, so it handles these the same as -s recursive. The ours strategy ignores all other inputs: its final commit is simply the HEAD commit's content, with extra parents, so merge base issues become irrelevant. Octopus, as already noted, doesn't use git merge-base --all in the first place. So if we need a recursive merge, the strategies that do it merge the merge bases and commit the result (including any merge conflicts, which is the main place a Balrog wreaks havoc on your quest). This merged result is then the new merge base for the merge operation.

    So, this gets us a to single merge base, either by throwing out extras (-s resolve) or merging them (everything else, except -s ours and -s octopus, which do not even go here). We now have exactly three commits to consider for our merge: B, the merge base; L, the "local" or --ours commit; and R, the "remote" or "other" or --theirs commit. These commits can be assumed to have some sort of precedes/follows relationship,1 but it no longer matters: the various two-head merge algorithms are now ready to consider three possible cases:2

    1. B = R. If the merge base commit is the "theirs" commit, there is nothing to do. Git says Already up to date. and does nothing.
    2. B = L. If the merge base is the "ours" (HEAD) commit, a fast-forward is possible. If it's allowed or required, Git will do it. Conflicts are impossible for this case; see below.
    3. B ≼ L, B ≺ R. A "true merge" is required.

    To perform a true merge, Git does an internalized variant of the following:

    • run git diff --find-renames B L: this is "what we changed";
    • run git diff --find-renames B R: this is "what they changed";
    • combine these changes.

    The combine changes step is where merge conflicts can occur. They do occur if:

    • lines affected in diff #1 overlap lines affected in diff #2, but the changes to those lines are not identical, or
    • lines affected in the two diffs abut (as jthill noted).

    Overlap is allowed if and only if the two diffs make the same change to those lines.

    If we force a "real merge" where a fast-forward is allowed (see #2), this means B = L, so the diff from B to L is empty. An empty diff never conflicts with another empty diff, nor with any non-empty diff: the result of combining is to take all of their changes.

    If we do have conflicts, the -X ours or -X theirs flags, if specified, now come into play: these resolve the conflict by favoring ours or theirs. For these cases, the merge is not stopped.

    If we have rerere enabled and there are now conflicts that have recorded resolutions, Git will take the recorded resolution. For these cases, however, the merge is stopped: you must inspect the result yourself. Presumably this therefore happens after the -X cases, but I have not tested that.

    If there are unresolved conflicts, the merge stops here, unfinished. It is your job to clean up any messes left behind (in your working tree and/or Git's index). Otherwise, unless --squash and/or --no-commit were specified, Git goes on to make the new merge commit.

    If the merge stops, the other head's (or heads') hash ID(s) are written to the MERGE_HEAD pseudo-ref, unless --squash was specified. This ensures that the next git commit will conclude the merge correctly.


    1If they didn't, we had to supply --allow-unrelated-histories, in which case the merge base is a dummy empty commit that precedes both branch-tip commits. The same code is used for cherry-pick and revert, where certain precedes/follows relationships may not hold intentionally, so it doesn't check; this description is strictly for git merge purposes.

    2It would be possible to check R ≼ L up front, but I don't think Git actually does. The effect should be the same either way.