Search code examples
sparqlwikidatadata-collection

Get highest level P131 for given place on Wikidata


I have the following query:

SELECT DISTINCT
?place
?place_eng
?admin_eng
?country_eng
WHERE {
  VALUES ?place { wd:Q3437 wd:Q1903 wd:Q5083 }
  ?place wdt:P131 ?admin.
  ?place wdt:P17 ?country.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en,en".
    ?place rdfs:label ?place_eng.
    ?admin rdfs:label ?admin_eng.
    ?country rdfs:label ?country_eng.
  }
}

This yields the data:

place place_eng admin_eng country_eng
wd:Q1903 Catania Metropolitan City of Catania Italy
wd:Q3437 Perugia Province of Perugia Italy
wd:Q5083 Seattle King County United States of America

As you can see under admin_eng, this includes the first administrative area the city is in, through the P131 (located in the administrative territorial entity) property of the place.

Instead of it, I would like to obtain the highest level administrative area the place is in, like the state in the case of cities in the USA or the region in the case of Italian ones. For example:

place place_eng admin_eng country_eng
wd:Q1903 Catania Sicily Italy
wd:Q3437 Perugia Umbria Italy
wd:Q5083 Seattle Washington United States of America

I understand that I will need to go up the chain of P131 to do this, but I don't know how to tell the query to stop when the next level is the country itself, or how to do any of this at all, really. I'm almost a complete beginner at SPARQL and Wikidata.

I would also need the solution to be as efficient as possible, since I need to run this on many records, but a solution that works is a good start.


Solution

  • I've found a solution which doesn't require any backtracking:

    SELECT DISTINCT ?start ?admin
    WHERE {
      VALUES ?start { wd:Q3437 wd:Q1903 wd:Q5083 }
      ?start wdt:P131+ ?admin.
      ?admin wdt:P131 ?temp.
      ?temp wdt:P31 wd:Q6256.
    }
    

    This selects distinct results starting from the given values, recursively looking through the chain of administrations each belongs to (so, going all the way up to the country), then takes those administrations which have themselves an administration (?temp), which has to be an instance of country (so they must be a first-level administrative unit).

    The following also works, but traverses the chain backwards:

    SELECT DISTINCT ?start ?admin
    WHERE {
      VALUES ?start { wd:Q3437 wd:Q1903 wd:Q5083 }
      ?start wdt:P131+ ?admin.
      ?temp ^wdt:P131 ?admin;
             wdt:P31 wd:Q6256.
    }
    

    I'm not able to tell whether it would be slower, faster, or equivalent to the other version. If anyone knows, please comment.