Search code examples
gitmergediffgit-mergegit-diff

Git Merge Erroneously Identifies Conflicts in Blocks


I have in my repository a single file, data.csv, which represents a CSV-formatted database. For the sake of example, let's suppose the contents of data.csv are

1,2,3
2,3,4
4,5,6

Originally, I only have the master branch and I create two branches A and B, where I modify data.csv independently. I've noticed that sometimes, the 3-way diff algorithm identifies conflicts that in my eyes, shouldn't be conflicts at all. For example, if A modifies the file to be

1,4,5
2,3,4
4,5,6

and B modifies the file to be

1,2,3
2,6,7
4,5,6

When I issue git merge A from branch B, instead of auto-merging these versions, it actually reports the following conflict:

<<<<<<< HEAD
1,2,3
2,6,7
=======
1,4,5
2,3,4
>>>>>>> A
4,5,6

But it seems to me that actually these versions should be auto-mergeable with the 3-way diff logic on a line-by-line level, since A only modifies the first line, and B only modifies the second.

My Questions: Why does this happen? And is there a way to force Git to do a more fine-grained diff (e.g. line-by-line)? (Or alternatively, are there any ways to force Git to realize that these changes are actually auto-mergeable?)


Solution

  • As I mentioned in a comment, the way you could handle this today is to write a merge driver. Writing a good merge driver is not trivial, but you will be able to experiment with it, and apply it only to specific files.

    If you don't define a merge driver yourself, Git uses its own built-in one. This built-in one is mostly identical to the git merge-file command. (It might be exactly identical to it, since these are built from various shared source files in Git. Note that the built in "low level" merge driver in ll-merge.c is where the choice of running a configured merge driver, or using the built in code, actually happens.)

    Note that your merge driver needs, at a minimum, three inputs (you can give it up to five inputs):

    • a path name in which the driver can find the merge base version of the file;
    • a path name in which the driver can find the current (--ours) version of the file, and to which the driver must write the final, merged version of the file; and
    • a path name in which the driver can the other (--theirs) version of the file.

    The driver's job is to read the three input versions, however it chooses, and then to write the correct merge result, obtained however it likes, to the middle one of these three path names. The path names will be the names of temporary files: do not assume that any of these three file names makes any sense or has any relationship to the historical names of the files being merged.

    The extra data you can pass to your own program include the user's desired conflict marker size (default 7) and the path name to which the merge result will eventually be copied. That is, suppose we're merging a file whose name in the merge base is orig.wrongsuffix, whose name in the --ours commit is ours.csv, and whose name in the the --theirs commit is renamed-wrongly.csv. The three input files will likely have file names of the form .git-tmp-1234567 or similar. Given the existing recursive or resolve strategies, the driver's output will eventually wind up in a file named ours.csv, though because there is a rename/rename conflict (we fixed the name, and they tried to fix the name), the merge will stop with a conflict even though our merge driver will be able to produce a merged result.

    To indicate a successful merge—i.e., that the merge does not have to stop due to conflicts found by your own merge driver—your merge driver should return a successful exit status when it terminates. In other words, from C code, call exit(0); from Python, use sys.exit(0) or equivalent; from Go, use os.Exit(0); and so on. To indicate that, despite your driver's best efforts, your code was unable to produce the correct merge result—and therefore may or may not have left merge conflict markers in its output file—supply a nonzero exit status (preferably a small nonzero value such as 1; there are a few special values around 125-127 for use in things like git bisect that might be treated specially in other parts of Git as well; for traditional Unix programming reasons, values should not exceed 127).

    To tell Git to use your merge driver, you need to do two things:

    • create a .git/config or $HOME/.gitconfig or other entry that defines the driver, telling Git how to run it;
    • create a .gitattributes entry (creating the file first if needed) telling Git to use your driver on this particular .csv file, for instance.

    The instructions for defining these are in the gitattributes documentation.