Search code examples
unionsparqldbpedia

SPARQL: union on dbpedia changes birthdate


I'm facing a strange behavior by using "union" on the endpoint http://de.dbpedia.org/sparql. By using

SELECT distinct *
WHERE {
{
?name dcterms:subject category-de:Haus_Liechtenstein.
?name rdf:type foaf:Person.
?name <http://dbpedia.org/ontology/birthDate> ?birthdate.
Optional {?name dbpedia-owl:deathDate ?deathDate.}
Optional {?name <http://de.dbpedia.org/property/gnd> ?gnd.}
}
filter (!bound(?deathDate))
}
Order BY ASC (?birthdate)

the birthdate of "Marie Kinsky" for example is "1940-04-14Z", which is correct (1st row). When I'm now adding a second source with union:

    SELECT distinct *
WHERE {
{
?name dcterms:subject category-de:Haus_Liechtenstein.
?name rdf:type foaf:Person.
?name <http://dbpedia.org/ontology/birthDate> ?birthdate.
Optional {?name dbpedia-owl:deathDate ?deathDate.}
Optional {?name <http://de.dbpedia.org/property/gnd> ?gnd.}
}
union{
SERVICE silent <http://dbpedia.org/sparql>{
?name dcterms:subject category-en:Princely_Family_of_Liechtenstein.
?name rdf:type foaf:Person.
?name dbpprop:father ?father.
?name dbpprop:mother ?mother.
?name dbpprop:birthDate ?birthdate.
Optional{?name dbpedia-owl:spouse ?spouse.}
Optional{?name dbpprop:shortDescription ?title.}
Optional{?name dbpedia-owl:individualisedGnd ?gnd.}
Optional {?name dbpedia-owl:deathDate ?deathDate.}
}}
filter (!bound(?deathDate))
}
Order BY ASC (?birthdate)

then I get the Birthdate of "Marie" with "1940-04-13+02:00" which is wrong (first row). By checking the date manually Marie, the birthdate is "1940-04-14".

Can someone explain me this behavior?

Thank you in advance and best regards Fobi


Solution

  • Try the following query (a very cut down version of your original) on http://de.dbpedia.org/sparql:

    PREFIX category-en: <http://dbpedia.org/resource/Category:>
    SELECT distinct *
    WHERE {
      SERVICE silent <http://dbpedia.org/sparql> {
        ?name dcterms:subject category-en:Princely_Family_of_Liechtenstein.
        ?name dbpprop:birthDate ?birthdate.
      }
    }
    

    Note that each of the dates have this suspicious timezone shift, and Marie has 1940-04-13+02:00.

    Now try the following on http://dbpedia.org/sparql:

    PREFIX category-en: <http://dbpedia.org/resource/Category:>
    SELECT distinct *
    WHERE {
      ?name dcterms:subject category-en:Princely_Family_of_Liechtenstein.
      ?name dbpprop:birthDate ?birthdate.
    }
    

    Now I see Marie has birth date 1940-04-14+02:00!

    I wonder whether the dbpedia endpoint is trying to make time zone corrections based on the locale of the client? But it really isn't getting it right.

    (It's not just Liechtenstein royalty, most birth dates have this feature)

    Update:

    From the dbpedia mailing list:

    We recently recognized that there are inconsistencies with dates in the DBpedia Sparql endpoint: dates in the SPARQL endpoint are timezoned +02:00 while on the DBpedia pages and in the dumps they are not.

    [...]

    That is most probably an issue of Virtuoso and there already was an issue raised in 2011 on the virtuoso-users mailing list