Search code examples
c#.netrdfdotnetrdf

How to compare simple and typed literals in dotnetrdf?


I'm comparing two graphs, one from a Turtle file with simple literal objects, the other from a file with explicit datatype IRIs. The graphs are otherwise equal.

Graph A:

<s> <p> "o"

Graph B:

<s> <p> "o"^^xsd:string

According to RDF 1.1 (3.3 Literals), "[s]imple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string". This is reflected in the concrete syntax specifications as well (N-Triples, Turtle, RDF XML).

So I'd expect both my graphs to consists of a single triple with a URI node s subject, a URI node p predicate, and a literal node o with type xsd:string object. Based on this I'd expect there to be no difference between the two.

However this is not the case in practice:

var graphStringA = "<http://example.com/subject> <http://example.com/predicate> \"object\".";
var graphStringB = "<http://example.com/subject> <http://example.com/predicate> \"object\"^^<http://www.w3.org/2001/XMLSchema#string>.";

var graphA = new Graph();
var graphB = new Graph();

StringParser.Parse(graphA, graphStringA);
StringParser.Parse(graphB, graphStringB);

var diff = graphA.Difference(graphB);

There's one added and one removed triple in the difference report. The graphs are different, because the datatypes for the object nodes are different: graphA.Triples.First().Object.Datatype is nothing, while graphB.Triples.First().Object.Datatype is the correct URI.


It appears to me that to modify this behaviour I'd have to either

  • go all the way down to LiteralNode (and change its assumptions about literal nodes), or
  • create a new GraphDiff (that takes the default datatype of string literals into account).

A workaround is to remove the "default" datatypes:

private static void RemoveDefaultDatatype(IGraph g)
{
    var triplesWithDefaultDatatype =
        from triple in g.Triples
        where triple.Object is ILiteralNode
        let literal = triple.Object as ILiteralNode
        where literal.DataType != null
        where literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#string" || literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#langString"
        select triple;

    var triplesWithNoDatatype =
        from triple in triplesWithDefaultDatatype
        let literal = triple.Object as ILiteralNode
        select new Triple(
            triple.Subject,
            triple.Predicate,
            g.CreateLiteralNode(
                literal.Value,
                literal.Language));

    g.Assert(triplesWithNoDatatype.ToArray());
    g.Retract(triplesWithDefaultDatatype);
}

Is there a way in dotnetrdf to compare simple literals to typed literals in a way that's consistent with RDF 1.1, without resorting to major rewrite or workaround as above?


Solution

  • dotNetRDF is not RDF 1.1 compliant nor do we claim to be. There is a branch which is rewritten to be compliant but it is not remotely production ready.

    Assuming that you control the parsing process you can customise the handling of incoming data using the RDF Handlers API. You can then strip the implicit xsd:string type off literals as they come into the system by overriding the HandleTriple(Triple t) method as desired.