Search code examples
python-3.xgeolocationwikipedia-apiwikidatapopulation

How to get the population of a location query (country or state/province and/or county) using wikipedia API?


I am trying to use the Covid-19 Dataset to build an SIR model. In order to build this model, I require the population of each location (country or province/state and/or county) to calculate the S (susceptible) in SIR. Since this dataset does not contain population data, I thought it would be good to do this using an API. I came across countryinfo, but the population estimates have not been updated since 2018 (according to the example and pypi); Also, one must be careful when entering country names as the ones accepted by countryinfo are not necessarily the same as those provided in the dataset.

from countryinfo import CountryInfo

country = CountryInfo('Singapore')
p = country.population()
print(p)
# 5469700

country = CountryInfo('United States')
# country = CountryInfo('US') # is not accepted
p = country.population()
print(p)
# 319259000

I can type generic queries (ie, type "US" or "United States") into google to find the population of any location, but I am not sure how to do this programmatically in python. Typing 'us' in-place of location below will show the US population (via this solution).

query = 'https://www.google.com/search?q=' + location + 'population

I think the wikipedia API can be used to the same effect, but I am not quite sure how to do this. Is there a better way? If not, how can I use wikipedia to get the population from a queried location?


Solution

  • As smartse mentioned, this is certainly easier to solve with Wikidata rather than Wikipedia. On Wikipedia, information is not stored in a structured way, thus you can not write a query to get directly the population out. You would have to use an API call to load the article about the place, and then parse the text with your own code to retrieve the population.

    For querying Wikidata, you can use the Wikidata Query Service. The query which first performs a search given a keyword and then returns the population of the results is the following

    SELECT ?population WHERE {
      SERVICE wikibase:mwapi {
          bd:serviceParam mwapi:search "Singapore" .    
          bd:serviceParam mwapi:language "en" .    
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam wikibase:limit 1 .
          ?item wikibase:apiOutputItem mwapi:item .
      }
      ?item wdt:P1082 ?population
    }
    

    Be careful, also in Wikidata the data is sometimes outdated. But since the population does not change dramatically from one year to the next, this should not be a problem for your application.