I've got a table, where there are documents with identical authors. I need to get the distinct pairs of documents. I did the following:
SELECT DISTINCT ?d1 ?d2 WHERE {
?d1 myns:creator ?x.
?d2 myns:creator ?y.
FILTER (?x=?y && ?d1!=?d2).
}
GROUP BY ?d1 ?d2
But for this both DOC1, DOC2
and DOC2, DOC1
are in the result. I need to get rid of one of the pairs.
Here is the whole triples database:
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix myns: <http://my.local.namespace#> .
_:doc1 rdf:type myns:Document.
_:doc1 myns:creator _:Pete.
_:doc1 myns:year "2000"^^xsd:integer.
_:doc1 myns:publisher _:p1.
_:doc2 rdf:type myns:Document.
_:doc2 myns:creator _:John.
_:doc2 myns:year "2004"^^xsd:integer.
_:doc2 myns:publisher _:p2.
_:doc3 rdf:type myns:Document.
_:doc3 myns:creator _:Pete.
_:doc3 myns:publisher _:p3.
_:doc4 rdf:type myns:Document.
_:doc4 myns:creator _:Bob.
_:doc4 myns:year "2010"^^xsd:integer.
_:doc4 myns:publisher _:p2.
_:Pete rdf:type myns:Person.
_:Pete myns:knows _:Bob.
_:Pete myns:knows _:John .
_:John rdf:type myns:Person.
_:John myns:age "29"^^xsd:integer.
_:John myns:knows _:Bob.
_:Bob rdf:type myns:Person.
_:Bob myns:age "35"^^xsd:integer.
The result, that I am getting, after executing query is:
D1 D2
_:891f1e98-b411-4e54-9533-18d530f09c6ddoc1 _:891f1e98-b411-4e54-9533-18d530f09c6ddoc3
_:891f1e98-b411-4e54-9533-18d530f09c6ddoc3 _:891f1e98-b411-4e54-9533-18d530f09c6ddoc1
As it is noticeable, technically both pairs are same. I junst need distinct one (i.e. one of them is enough). I am not sure about enviromental characteristics. But there is Sesame framework
This will work in some systems:
SELECT ?d1 ?d2 WHERE {
?d1 myns:creator ?x.
?d2 myns:creator ?y.
FILTER (?x=?y && STR(IRI(?d1)) < STR(IRI(?d2))).
}
?d1
and ?d2
are going to be blank nodes. But blank nodes are blank.
So to provide the ordering for <
, we need some kind of query-wide label or value associated with each one.
Your data does not have any distinguishing triples for each person.It would be better to put real names in the data:
_:Pete rdfs:label "Pete" .
Even better, use the FOAF vocabulary.
Some systems allow blank nodes in IRI()
- technically it's an extension of the SPARQL specification. You can then take the STR
form and compare. that works on your data for me (Apache Jena) - You don't say which RDF system you are using.
The best solution is put distinguishing information into the data.