Search code examples
gitgit-mergegit-merge-conflict

Understanding git conflicts


I think I might be misunderstanding something about how git handles things, and, therefore, I'm facing some rather annoying conflicts.

I start a new branch A, from master, and start creating new files. Eventually, from branch A, I create branch B, and start working on that as well. A branch A continues to be developed, B needs the changes made in branch A, so I merge A into B, and continue working on both of them.

Times goes on, branch A continues to be developed, until it is merged into master. At this point, I think that, now it has been merged into master, I can do a simply merge master into B, and get all changes from A, and from everyone else as well.

Problem is, now that I'm trying to do that, I'm getting multiple conflicts of "Both added", "Both modified", and such - but on files I haven't changed. I totally understand the conflicts in the files I changed - I caused them, I know it full well.

Thinking my explanation my get confusing, I ventured into Google Slides and created this "amazing" drawing to illustrate my scenario (in which the arrows represent merges, as in "merged master into A", unless they point to the first commit in the branch - in which case they mean "branched off from here"; that is, pretty standard git notation, apart from the arrow direction - not sure here): er


Regarding files other people changed, I don't understand why are they listed as conflicts. That is, there is a file which has been changed by the time I created branch A, and that same file got changed again when I tried to merge master into B. Sure, branch B still has a super old version of that given file, but, still, shouldn't git recognize that they have not been changed in my commits, and just overwrite them, or whatever it does? What am I missing here?

EDIT: just to clarify, I'm the sole contributor of both branches A and B.


Solution

  • Your graph drawing is very pretty, but it's rather misleading.

    Let's start with a question: which branch are the blue commits on?

    This is a trick question. They're not on a branch, they're on several branches, plural. If you say they're on branch A, well, that's true, but they're also on master and most of them are also on branch B.

    In your drawing, there are no identifiers for each commit. This makes them hard to talk about: I could say "the second-from-left grey commit" or "the rightmost blue commit", but that's kind of unwieldy, so let me redraw the graph as text, using single uppercase letters in each commit. I also won't use arrows here as they're too hard to do in text.

    A--B--C--D--E--F--G--H--I--J--K--L   <-- master
        \        \            /
         M--N--O--P--Q---R---S   <-- branch-A
             \        \
              T--U--V--W--X--Y--Å--Ø--Z   <-- branch-B
    

    That is, the tip commit of master is commit L. The tip commit of branch-A is commit S. The tip commit of branch-B is commit Z.

    In Git, a branch name, like master, always points to one single commit. You get your choice of which commit; the commit you select this way is the tip commit of the branch. Any earlier commit that is reachable from this tip is also on the branch. So by starting at L and working backwards—leftwards, in these drawings—following the arrows from commit to commit,1 we go from L to K, then from K to J. But J is a merge commit: it has two parents, not just one. So from J we go to both S and I. From those two, we go to both H and R, and on to G and Q, and F and P, and E. There are two ways to reach E but we only visit it once anyway. From here we can go on to O and D, N and C, M and B, and A. So that list of commits is the set of commits that is on master.

    (Note that we cannot go from Q to W, though we can go the other way, from W to Q: all the arrows are one way, pointing backwards.)

    The set of commits on branch-A starts with S and goes backwards to R, then Q, then P, then E and O and D and N and so on. All of these commits are also on master.

    The set of commits on branch-B starts at Z, moves back through Ø and Å and Y and X and W, then picks up both P and V, and then E and O and U, and so on. There are thus four commits on branch-B that are not on any other branch.


    1Technically, all of Git's internal arrows point backwards. So you've drawn your arrows backwards by drawing them forwards. 😀 This situation arises because commits are fully read-only: a parent can't know in advance what hash IDs its eventual children might have, but a child commit does know, in advance / at creation time, what parent hash IDs that child has. So the arrows must point backwards.

    (In order to move forwards, you tell Git where you want to end up, and then it moves backwards from there, to make sure that it can end up there from some other starting point.)


    How git merge works, greatly abbreviated

    When you run:

    git checkout master
    git merge branch-B
    

    Git must find the merge base commit of commits L and Z. To do that, it works backwards, as most Git operations do, from these branch tip commits, to find the best shared commit: a commit that is reachable from both branch tips, and hence on both branches, but is "closest to the tips" as it were. In this case, though it's perhaps not immediately obvious, that's commit Q. Start at L, go back through K to J and down to S and then R and then Q. Meanwhile, start at Z, go back to W and up to Q. Commits P, E, O, and so on are all also on both branches, but commit Q is "better" because it is the last such commit: it is a descendant of all of those commits.

    Git will now run two git diff commands internally. This will compare the merge base—commit Q—vs the two branch tips:

    git diff --find-renames <hash-of-Q> <hash-of-L>   # what we changed
    git diff --find-renames <hash-of-Q> <hash-of-Z>   # what they changed
    

    In these two diff listings, if some file appears to be newly-created in both branch tips, you will get an add/add conflict for that file. The file wasn't in Q, but it is in L and Z.

    For files that are in all three commits, Git will attempt to combine any changes shown in the two sets of diffs. Where these changes do not overlap (and don't "touch at the edges" either), Git can combine them. Where they do overlap but those overlaps are exactly the same—cover the same original lines in Q, and make the same changes—Git can combine that by just taking one copy of the change. All other overlaps result in merge conflicts.

    Your job at this point is to resolve any and all conflicts, any way you like. For instance, in the add/add whole-file conflicts, if the two files match, just pick either one's content. If not, combine the two files somehow. When you're done, write the final contents for each conflicted file into the index using git add. This marks the index conflict as "resolved", and you can now run git merge --continue or git commit to complete the merge.2


    2git merge --continue checks to make sure you're finishing a merge. If not, it errors out. If you are, it just runs git commit—so there's no real difference either way, unless you aren't actually finishing a conflicted merge.