Search code examples
sparqlfreebase

Why DISTINCT keyword lead to different entity for these two queries?


Query 1

PREFIX ns: <http://rdf.freebase.com/ns/>
SELECT DISTINCT ?x
WHERE {
FILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))
?x ns:type.object.type ns:religion.religious_leadership_title .
?x ns:religion.religious_leadership_title.leaders ?c0 .
?c0 ns:religion.religious_organization_leadership.start_date ?sk0 .
}
ORDER BY ?sk0
LIMIT 1

Query 2

PREFIX ns: <http://rdf.freebase.com/ns/>
SELECT ?x
WHERE {
FILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))
?x ns:type.object.type ns:religion.religious_leadership_title .
?x ns:religion.religious_leadership_title.leaders ?c0 .
?c0 ns:religion.religious_organization_leadership.start_date ?sk0 .
}
ORDER BY ?sk0
LIMIT 1

So the only difference between Q1 and Q2 is that there is a DISTINCT keyword when SELECT ?x in Q1. However, Q1 gives answer m.01h_90 while Q2 gives answer m.05rd8. Ideally, I feel this should not lead to different results, as the purpose of DISTINCT is only to get rid of duplicates in the results set if I understand it correctly, so if the original results do not have duplicates at all, there should not be any difference by adding the DISTINCT keyword.


Solution

  • You have a tie on the value you're ordering. Specifying distinct is causing a different execution plan which orders the rows differently, though still ordering by the one column as requested, with another row as the first one to output. Add the output column to the order by clause and you should see consustent results between the two queries.