Search code examples
pythonpython-3.xdata-cleaninggeopy

How to get latitude and latitude for an address column in a dataframe using geopy?


I am currently working on a kaggle dataset House price prediction

It has errors in the latitude and latitude column, so I decided to use geopy to get right values for those two columns.

And it works fine if I use it on one single address but returns None if applied on entire column.


city = []
lat = []
longi = []

for addr in train_df['address']:
  geolocator = Nominatim(user_agent="ram")
  location  = geolocator.geocode(addr), timeout=100, language = 'en')
  city.append(location.address.split(',')[-4])
  lat.append(location.latitude)
  longi.append(location.longitude)

It returns None, maybe because of multiple time access.

Please help me suggest some other way to get latitude and longitude for the 'address' column of my dataframe(or some other library meant for the same job).


Solution

  • The problem is in the address of the dataset. If you use the below try catch code, you can see there are many wrong address in the dataset.

    for addr in train_df['ADDRESS']:
        geolocator = Nominatim(user_agent="ram")
        location = geolocator.geocode(addr, timeout=10000, language = 'en')
        try:
            city.append(location.address.split(',')[-4])
            lat.append(location.latitude)
            longi.append(location.longitude)
        except:
            print(addr)
    

    For example, one of the address is "Garebhavipalya,Bangalore". If you search that in google, the correct address is "Garvebhavi Palya,Bangalore". It's "Garvebhavi Palya" but the dataset shows as "Garebhavipalya,Bangalore". Therefore, if you check that address with below code, you can get the Bangalore address and longitude and latitude are near to dataset's data.

    geolocator = Nominatim(user_agent="ram")
    location = geolocator.geocode("Garvebhavi Palya,Bangalore", timeout=100, language = 'en')
    print(location.address)
    print(location.latitude)
    print(location.longitude)
    

    To check the address of given dataset's longitude and latitude, you can use reverse those.

    geolocator = Nominatim(user_agent="ram")
    location = geolocator.reverse("12.96991,77.59796")
    print(location.address)
    

    As I'm not familiar with Bangalore, I'm not sure dataset's lat 12.96991 and lon 77.59796 are this "Garebhavipalya,Bangalore" address or not. But I think the problem lies in "ADDRESS" of dataset.

    BTW, I think this GeoPy library is really good. I just know it when you ask in here. Thanks for your question :)