Search code examples
sparqlwikidata

Wikidata sort cities in a given state by population


I'm looking to recreate this list of cities in Texas by population using wikidata.

I see I can do states by population with this query:

SELECT DISTINCT ?state ?stateLabel ?population
{
  ?state wdt:P31 wd:Q35657 ;
           wdt:P1082 ?population .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?state ?population ?stateLabel
ORDER BY DESC(?population)

And the id for Texas is wd:Q1439

So I have tried the following:

SELECT ?country ?countryLabel ?state ?stateLabel ?city ?cityLabel ?population
WHERE
    {
#    ?state wdt:P31 wd:Q35657 .    # Give me an american state
    ?state wdt:P31 wd:Q1439 .      # that state is is Texas
    ?city wdt:P31 wd:Q515 .        # it is an instance of city
    ?city wdt:P17 ?country.        # get the country (for double-checking)
    ?city wdt:P361 ?state.         # get the state it belongs to
    ?city wdt:P1082 ?population .  # get the population of the city
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } 
    }
  ORDER BY DESC(?population) limit 68

And no matches. What's the right query for this?

Update

This returns data, but is incorrect. It misses cities like Houston, San Antonio, etc.

SELECT ?mun ?munLabel ?population WHERE {
  {
    SELECT distinct ?mun ?population WHERE {
     values ?habitation {
       wd:Q3957 
       wd:Q515 
       wd:Q15284
      } 
      ?mun (wdt:P31/(wdt:P279*)) ?habitation;
        wdt:P131 wd:Q1439;
#        wdt:P625 ?loc; 
        wdt:P1082 ?population .                                 
    }
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
  }
}
ORDER BY DESC(?population)

Solution

  • The issue is that Houston and San Antonio's locations are listed as Harris and Bexar county respectively, and the counties are located in Texas. If you try this query it should work:

    SELECT ?mun ?munLabel ?population WHERE {
      {
        SELECT distinct ?mun ?population WHERE {
         values ?habitation {
           wd:Q3957 
           wd:Q515 
           wd:Q15284
          } 
          ?mun (wdt:P31/(wdt:P279*)) ?habitation;
            wdt:P131+ wd:Q1439; #Add '+' here for transitivity
    #        wdt:P625 ?loc; 
            wdt:P1082 ?population .                                 
        }
      }
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
      }
    }
    ORDER BY DESC(?population)
    

    The trick is to add + next to wdt:P131, which translates the query from "look for exactly one wdt:P131 edge" to "look for one or more wdt:P131 edges".

    This takes care of the issue because Harris and Bexar counties are themselves listed as being located in Texas.