I'm playing with Blazegraph (2.1.5) and Jena Fuseki (3.10.0). First I insert two triples with the following query:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
insert data {
<http://s> <http://untyped> 'abc' .
<http://s> <http://typed> 'abc'^^xsd:string .
}
The triples have objects with the same string value, but one of them is untyped, and another is types as xsd:string
.
Then I execute the following query:
select * where { ?s ?p 'abc' }
Jena Fuseki finds both triples, while Blazegraph only finds the 'untyped' one.
The same happens if I specifically ask for a typed version:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select * where { ?s ?p 'abc'^^xsd:string }
Jena Fuseki again finds both triples, while Blazegraph only finds the 'typed' one.
The behavior is clearly different.
Here are my questions:
UNION
or FILTER
?This is an interesting question because the answer is not obvious at all. Current triplestores implement the query and update language SPARQL 1.1, standardised in 2013. It is a query language for RDF, but for the version of RDF in place at the time, that is, RDF 1.0, standardised in 2004.
In RDF 2004, literals could be plain literals or typed literals. Plain literals were a UNICODE string, with an optional language tag. Typed literals were a UNCODE string with a datatype URI.
SPARQL calls plain literals without language tag "simple literals". A simple literal, being a single UNICODE string, is never the same as a typed literal, which is a pair in all cases. So "some text"
and "some text"^^xsd:string
are different literals in RDF 2004 and in SPARQL 1.1.
Now, in 2014, a new version of RDF, RDF 1.1, appeared where all literals have a datatype IRI, including literals with language tags. Language-tagged strings do not have to mention their datatype IRI in concrete syntaxes (the presence of a language tag is sufficient to identify the datatype IRI as rdf:langString
). Literals typed with xsd:string
may be written without the datatype IRI in concrete syntax. Consequently, "some text"
in Turtle or N-triple syntaxes truly means "some text"^^xsd:string
, according to RDF 1.1.
The problem related to your question appears when you use an RDF API conforming to RDF 1.1, together with a SPARQL 1.1 implementation. If you load an RDF document that contains:
<subject> <predicate> "some text" .
should it be interpreted according to the RDF 1.1 spec, or should it be loaded following the SPARQL 1.1 specification? In principle, this:
INSERT DATA {
<http://s> <http://untyped> 'abc' .
<http://s> <http://typed> 'abc'^^xsd:string .
}
is SPARQL 1.1, so it should be understood to contain 2 triples, one of which is a simple literal, one is a typed literal. But SPARQL implementations use RDF APIs, so mixing RDF 1.1 and SPARQL 1.1 may get the systems to apply unpredictable behaviour. You can only rely on the documentation and testing for your specific implementation, I guess.