Search code examples
pythonpython-3.xpandascsvgeopy

How can I read a CSV, add two more columns, and insert data that is dependent on the current columns?


I have a csv that looks like this but is much longer:

ID Address
32 54298 Boca Chica Blvd Brownsville, TX
31 6640 Washington St, Yountville, CA 94599

I am attempting to loop through the csv, use geopy to get latitudes and longitudes for each address, and then insert them into a 3rd and 4th column that would look like this:

ID Address Latitude Longitude
32 54298 Boca Chica Blvd Brownsville, TX 34.5165131 90.1656516
31 6640 Washington St, Yountville, CA 94599 26.1231 51.516125

It doesn't really matter if it is in the same old csv (that's what I was trying), or into a new csv. The previous posts I have read either focus on appending whole rows instead of columns, or they just populate the new columns with data that is hand typed/hard coded into the python script.

So far I am able to loop through and get the latitudes and longitudes. The problem I am having is primarily on the writing part. Here is what I have.

import geopy
import pandas as pd
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sample app")
df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")
for row in df.itertuples():
    try:
        data = geolocator.geocode(row[7])
        print(data.point.latitude)
    except AttributeError:
        print('error')

Any help would be greatly appreciated! I only have a bit of Java experience from back in the day. The documentation and published examples for Python are not as intuitive to me as Java's was, so getting started is a bit of a challenge.


Solution

  • All you need to to add columns and then save the df dataframe. See also Adding new column to existing DataFrame in Python pandas

    You can try something like this:

    import geopy
    import pandas as pd
    from geopy.geocoders import Nominatim
    geolocator = Nominatim(user_agent="sample app")
    
    df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")
    
    # add new columns
    df['Longitude'] = 0
    df['Latitude'] = 0
    
    # process all rows
    for row in df.itertuples():
        try:
            data = geolocator.geocode(row[7])
            df.at[row.Index, 'Longitude'] = data.point.longitude
            df.at[row.Index, 'Latitude'] = data.point.latitude
        except AttributeError:
            print('error')
    
    # write to output
    df.to_csv(r"C:\Users\Desktop\LatLongTest_2.csv", sep=";", index=False)