Search code examples
sparqlmediawikiwikipediawikidata

Retrieving properties last updated before/after arbitrary date


I'm interested in retrieving properties of a WikiData item, but only if the property was added or modified either before or after some date.

So I have this SPARQL query that gets all properties for Q24.

SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {

    VALUES (?item) {(wd:Q24)}

    ?item ?property [?statement_property ?statement_property_obj] .
    ?prop wikibase:claim ?property.
    ?prop wikibase:statementProperty ?statement_property.

    # Call label service.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

} ORDER BY ?propLabel

Now, I'd like to keep only those properties that were modified either before < or after > an arbitrary date (e.g. 1/1/2017). I know there is a "last update" property P5017, but I don't know how I would use it to compare against an arbitrary date.


Solution

  • You probably can't do this with SPARQL, sadly. The only things that SPARQL knows about are:

    • a) the last date the item was edited at all (which gives you an effective "no later than" date for any claim in it) using schema:dateModified;
    • b) any specific dates embedded in claims that state (or hint at) when they were updated.

    For b) you could in theory use P813 (date information was retrieved). P5017 is for the date of revision of the ''source'', not the statement, and can be long in the past.

    However, this approach relies on those statements being present. Most references do not use these - Q24 only has one reference that uses P813. It's also not guaranteed that the claim has not been edited since then - you would assume probably not, but there's no way to be sure. They are not automatically applied or updated.

    References might also have P577 (publication date) which could be used to infer an update figure - if publication date is 2020-02-01, the claim was probably edited since the start of February, since it would be unlikely someone would cite a reference with a future publication date. But this is a bit tenuous and not amazingly useful unless it happens to match closely to your test date.

    In practice, I think you would need to parse the page history to be able to say anything for sure about when a given claim was last edited. Almost all edit summaries for claim edits are quite standardised so this should hopefully be practical to do without investigating each individual revision, but it might also be a lot of work...