Search code examples
wikipediadbpediawikidatadate

Extracting person date data from Wikipedia


I am trying to extract birth and death data from Wikipedia. I have used DBpedia and Wikidata but in this particular instance the dates do not match Wikipedia.

This query https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&titles=Thomas_MacDermot&sites=enwiki returns a P569 with a date of 1870-01-01. DBpedia shows the same date.

The Wikipedia page https://en.wikipedia.org/wiki/Thomas_MacDermot shows a date of 26 June 1870.

Why this discrepancy? And can this date information be retrieved programmatically (i.e. not screen scraping) from Wikipedia itself?

Thank you!


Solution

  • Wikidata supplements Wikipedia's mostly unstructured content with independently input structured data, which may or may not also be seen on Wikipedia.

    The DBpedia project translates much structured, and some unstructured, Wikipedia content to structured data.

    DBpedia (more clearly, DBpedia Snapshot) data typically lags Wikipedia changes by months to years. Here, we see the dbo:birthDate for Thomas MacDermot as "1870-1-1".

    DBpedia Live data typically lags Wikipedia changes by seconds to hours (with occasional longer delays due to software, hardware, and other issues in this evolving environment). Here, we see the dbo:birthDate for Thomas MacDermot as "1870-06-26"^^xsd:date.

    You may find On the Mutually Beneficial Nature of DBpedia and Wikidata to be of interest.


    P569 is described as "born on | birth date | birthdate| birth year | year of birth | birthyear | DOB" -- which is very confusing, to me. It seems that some entities are described with a full date in this property, while others are described only with a year in this property, and while this property is itself described as "never changing", the data Wikidata has stored may be incorrect, so the value in Wikidata may well change even if the fact doesn't.