Search code examples
sparqlrdfowl

SPARQL Update minimal diff for RDF/XML?


My RDF/OWL ontology is versioned as an RDF/XML file in a git repository that I normally edit in a text editor, but I am planning a refactoring that would take too long manually and that is not possible with regular expressions alone.

Specifically, I want to split a generic property in two more specific ones based on the class of the object.

For example

:Alice :responsibleFor :ACME.
:Bob :responsibleFor :Cooking.

should become

:Alice :responsibleForCompany :ACME.
:Bob :responsibleForTask :Cooking.

I am interested in an answer for the general case as well, not just for this specific property refactoring.

My idea is to load the files into a Virtuoso Triple Store, use SPARQL Update queries to refactor the property and then export it back as RDF/XML file. The problem is that this won't keep the order and formatting, which will confuse git and make usage of the old history, such as undoing an old commit, impossible.

Is there a way to work directly with the file structure in order to produce a diff as minimal as possible?


Solution

  • Michael's answer is an excellent solution, but if you do wish to stick to using git history, I'd recommend that you switch to a different syntax format. RDF/XML, being XML (i.e. nested elements over multiple lines), is notoriously troublesome for line-by-line diffs, especially since the tool writing the XML can decide to completely rearrange blocks (there's no prescribed order for RDF/XML elements at the syntax level, and it's very hard to enforce anything like this).

    Switch to a line-based syntax format, like N-Triples or N-Quads, and enforce a canonical ordering when exporting back from Virtuoso (should be possible by using a SPARQL query with an ORDER BY clause as the export mechanism).