Search code examples
pythonnlpstanford-nlp

Extracting country name from an address


I've a large dataset with an address column. I would like to extract the countries from the address. In many cases, the address column contains states, cities, and zip code, but the country names. You can see samples of my data

Text

I'm using python, How I can extract the country name in all these cases.


Solution

  • There are two ways according which I would go ahead and try to find the country name.

    1. Have a NER model trained to identify the city name or state. So the ML model will extract you the city/state from the big address and then use the Google Geocoding API that will return you all the details if you pass the city/state name.
    2. Write a heuristic way to identify the city/state name and then use fuzzy match and look up into a database where you can maintain the known cities/state names against a country.

    You can refer to Google Geocoding API here. And use the python geocoder library to find the details. (Install it via pip install geocoder)

    import geocoder
    results = geocoder.google("Delhi")
    print(results.current_result)
    

    P.S You might have to set your API key first.