Search code examples
xmlgit

Can git be made to mostly auto-merge XML order-insensitive files?


When merging Lightswitch branches, often colleagues will add properties to an entity I also modified, which will result in a merge conflict, as the new XML entries are added to the same positions in the lsml file.

I can effectively always solve these by accepting left and right in no particular order, so one goes above the other, as order isn't important in these particular instances. On the rare instances this isn't valid, this would produce an error in the project anyway, which I accept as a risk (but haven't encountered).

Is there a way (preferably for the file extension) to get git to automatically accept source and target changes for the same position and simply place one beneath the other?


Solution

  • This gets pretty hard in general.

    Some have attempted to use Git's union merge (which is more accessible now than it was in early days; as in that question, you just add merge=union in a .gitattributes file), but this does not work in general. It might work sometimes. Boiling it down a lot, it works if your XML is always structured so that naive line-oriented union merge produces valid XML (basically, keeping whole XML sub-elements all on one line), and you are always adding whole new XML sub-elements.

    It is possible, in Git, to write a custom merge driver. Writing a useful one for XML is hard.

    First we need an XML diff engine, such as Sylvain Thénault's xmldiff, to construct two string-to-string (or tree-to-tree) edits for three XML files (the merge base, local or --ours, and other or --theirs files: diff base-vs-local and base-vs-ours). This particular one looks like it works similarly to Python's difflib. (However, due to the referenced papers, it looks like it produces tree move / nesting-level operations as well as simple insert and delete. This is a natural and reasonable thing for a tree-to-tree edit algorithm to do, and probably actually desirable here.)

    Then, given two such diffs, we need code to combine them. The union method is to ignore all deletions: simply add all additions to the base version (or, equivalently, add the "other" additions to the "local", or the "local" additions to the "other"). We could also combine tree insert/delete operations a la "real" (non-union-style) merges, and perhaps even declare conflicts. (And it might be nice to allow different handling of tree nesting-level-changes, driven by something vaguely like a DTD.)

    These last parts are not, as far as I know anyway, done anywhere. Besides that, the Python xmldiff I linked here is a fairly big chunk of code (I have not read it anywhere near closely, nor attempted to install it, I just downloaded it and skimmed—it implements both a Myers-like algorithm, and the fancier "fast match / edit script" algorithm from the Stanford paper).