Search code examples
pythonfor-loopwikipediawikipedia-api

Find coordinates in wikipedia pages iterating over a list


Probably this is a simple question, but my experience in for loop is very limited.

I was trying to adapt the solution in this page https://www.mediawiki.org/wiki/API:Geosearch with some simple examples that i have, but the result is not what i expected.

For example:

I have this simple data frame:

df= pd.DataFrame({'City':['Sesimbra','Ciudad Juárez','31100 Treviso','Ramada Portugal','Olhão'],
              'Country':['Portugal','México','Itália','Portugal','Portugal']})

I created a list based on cities:

lista_cidades = list(df['City'])

and i would like to iterate over this list to get the coordinates (decimal, preferably)

So far i tried this approach:

import requests

lng_dict = {}
lat_dict = {}

S = requests.Session()

URL = "https://en.wikipedia.org/w/api.php"

PARAMS = {
    "action": "query",
    "format": "json",
    "titles": [lista_cidades],
    "prop": "coordinates"
}

R = S.get(url=URL, params=PARAMS)
DATA = R.json()
PAGES = DATA['query']['pages']

for i in range(len(lista_cidades)):
    for k, v in PAGES.items():
    
        try:
            lat_dict[lista_cidades[i]] = str(v['coordinates'][0]['lat'])
            lng_dict[lista_cidades[i]] = str(v['coordinates'][0]['lon'])
    
        except:
            pass

but it looks like the code doesn't iterate over the list and always returns the same coordinate

For example, when i call the dictionary with latitude coordinates, this is what i get

lng_dict



   {'Sesimbra': '-7.84166667',
 'Ciudad Juárez': '-7.84166667',
 '31100 Treviso': '-7.84166667',
 'Ramada Portugal': '-7.84166667',
 'Olhão': '-7.84166667'}

What should i do to solve this?

Thanks in advance


Solution

  • I think the query returns only one result, it will take only the last city from you list (in your cas the "Olhão" coordinates).

    You can check it by logging the DATA content.

    I do not know about wikipedia API, but either your call lack a parameter (documentation should give you the information) or you have to call the API for each city like :

    import pandas as pd
    import requests
    
    df = pd.DataFrame({'City': ['Sesimbra', 'Ciudad Juárez', '31100 Treviso', 'Ramada Portugal', 'Olhão'],
                       'Country': ['Portugal', 'México', 'Itália', 'Portugal', 'Portugal']})
    lista_cidades = list(df['City'])
    
    lng_dict = {}
    lat_dict = {}
    
    S = requests.Session()
    
    URL = "https://en.wikipedia.org/w/api.php"
    
    for city in lista_cidades:
        PARAMS = {
            "action": "query",
            "format": "json",
            "titles": city,
            "prop": "coordinates"
        }
        R = S.get(url=URL, params=PARAMS)
        DATA = R.json()
        PAGES = DATA['query']['pages']
    
        for k, v in PAGES.items():
            try:
                lat_dict[city] = str(v['coordinates'][0]['lat'])
                lng_dict[city] = str(v['coordinates'][0]['lon'])
            except:
                pass