Search code examples
regexgitperlmerge-conflict-resolution

What are the Git conflict markers syntax/rules?


I am interested in what the rules are for the Git conflict markers? Are they stacked? What are the rules to opening (<<<<<<<) and closing (>>>>>>>)? I tried searching the Git docs, but there is nothing explicit about these and the only examples I find are simple, single, non-stacked conflicts.

Context: I need to sync two independent branches that have the same directory structure partially. I am creating a patch from one and applying it to the other branch. Conflicts are possible, but I always want to take 'ours'.

For applying the patch I use: git apply --3way --ignore-whitespace --whitespace=fix <patchfile>

Then for each file that contains conflict markers I do this: perl -i -0777 -pe '<{7} ours\r?\n((?:(?!<{7})(?!={7})(?!>{7}).*\r?\n)*?)={7}\r?\n(?:(?!<{7})(?!={7})(?!>{7}).*\r?\n)*?>{7} theirs\r?\n'

Basically I assume that the markers are stacked (from what I noticed) and try to solve them from inside out (innermost to outermost). This is how the file is after applying the patch:


<<<<<<< ours
<<<<<<< ours
<<<<<<< ours
{chunk of text}
<<<<<<< ours
=======
{chunk of text}
>>>>>>> theirs
=======
{chunk of text}
>>>>>>> theirs
{chunk of text}
<<<<<<< ours
=======
=======
>>>>>>> theirs
{chunk of text}
<<<<<<< ours
>>>>>>> theirs
=======
>>>>>>> theirs
=======
>>>>>>> theirs

{rest of file}

The problem is that now I found a file that after 2 substitution iterations reaches this state:


<<<<<<< ours
<<<<<<< ours
{text chunk}
<<<<<<< ours
=======
=======
>>>>>>> theirs
{text chunk}
<<<<<<< ours
>>>>>>> theirs
=======
>>>>>>> theirs
=======
>>>>>>> theirs

<rest of file>

... and I don't know what to make of this. This doesn't seem stacked. How should I resolve these conflicts?

Note: I also ran the regex manually iteration by iteration in an online debugger and it matches ok.

Edit for clarifications: When I wrote this I failed to mention that I am using git format-patch to generate the patch and this actually generates a patch for each commit in order to keep metadata, hence the multiple conflict markers. @torek nailed it in his answer without having the proper information.


Solution

  • Conflict markers do not stack.

    When Git is doing a file-level merge, there are three input files.1 Git calls the second file "ours" and the third one "theirs", and does not really have a proper name for the first one. The first one, however, is from the merge base. When you run git merge, Git works on a commit basis, and the three files for each case are from three commits that Git calls the merge base, the "ours" commit, and the "theirs" commit.


    1When you use git merge, these rules are specific to particular merge strategies, but the ones you'll use, that produce conflict markers in work-tree files, follow these rules. When you use git apply --3way and it has to do a true merge, it invokes this same code.


    But you're using git apply, which constructs each step one file at a time:

    • The "ours" file is the one in your work-tree, or in your index, depending on what flags you give to git apply. I will assume here that you are using the work-tree file (when using the index copy, enough other things are different that it's pretty clear that you are not using the index copy).

    • The "theirs" file isn't directly visible: in fact, it doesn't exist yet.

    • The "base" file may not exist either. If it doesn't exist, and the patch doesn't apply well, we won't even get to this point. If it does exist, it's been supplied by the index line in the patch, which contains a raw hash ID.

    So, at this point, since the patch is being applied and Git is doing a three-way merge on the file, Git did find the merge base file. Because the patch text was derived from this base file, the patch applies easily to the base file, which produces the "theirs" file. Git's job is now to merge their changes (i.e., the patch) with our changes (the base file vs ours).

    Git now makes a second diff, from the base file to our file, to see what we changed. If we added a line after line 42 before line 43, and they didn't touch the lines around lines 42–43, Git can take our extra line. If they changed line 50, Git can change line 50 of the base file (which is line 51 of our file since we added a line). This kind of combining goes on wherever we and they touched different lines in the file.

    Where we and they touched the same line, though, or touched lines that abut (e.g., if we changed line 55 and they changed line 56), Git declares a merge conflict. It writes, to the work-tree file:

    <<<<<<< ours
    our version
    =======
    their version
    >>>>>>> theirs
    

    and then continues on with the unchanged lines. If we set merge.conflictStyle to diff3, Git includes the base version of the lines, between these two versions, marked off by seven | characters.

    This work-tree file is your new work-tree file

    You are now going ahead and running a second git apply that needs to do a three-way merge. Your second git apply takes the work-tree file as is, and assumes that this is what we think the file should look like. (It's not, of course! But Git assumes it is.)

    So, Git now finds the merge base version of the file, and runs a git diff to see what we changed. Apparently, what we changed was to insert these huge <<<<<<< ours, =======, and >>>>>>> theirs markers along with both our and their changes.

    Meanwhile, Git compares the base version of the file against their version of the file—that's the patch you're applying—and now Git tries to combine our changes with their changes. Sometimes, you'll get:

    <<<<<<< ours
    [the entire block from the first `git apply`]
    =======
    [theirs]
    >>>>>>> theirs
    

    which will nest properly, but sometimes, the ours or their changes will "line up": the two diffs will synchronize and Git will, in effect, say: aha, our version of the file is the same as theirs here. So we'll get:

    <<<<<<< ours
    [part of the block from ours]
    =======
    [part of the diff from theirs]
    >>>>>>> theirs
    the synchronized part
    <<<<<<< ours
    =======
    ...
    

    where the second <<<<<<< ours is from when the diffs "lost sync" again and hit the ======= in our file—unless, that is, we get very (bad-)lucky and the ======= actually is part of the synchronization.

    What to do about this

    After running git apply, do not go on to apply more patches. Resolve the problem first.

    Optionally, consider using git merge-file so that you can change the conflict markers.