Search code examples
pythoncsvdata-extraction

pull coordinate data from .csv column that has address info


I have a series of .csv files with coordinate info in a specific column. Problem is, the coordinate info has address info in the same column, and not always in the same format.

15206 PLANK RD
BAKER, LA 70714
(30.586316235000027, -91.12488045299995)

or

LA
(31.312663324000027, -92.44567750499994)

So what I need to do is pull just the lat data into a new column, and the long data into a new column. I'm sure that this is a pretty simple straight forward process, I'm just not sure how to tackle this. I'd like to have a python script that could do this for me where I specify the column to search, and have it pull just the coordinate info into new columns in the same rows it pulls that data from.


Solution

  • Here's a little snippet which can extract long/lat fields from a desired column. It uses a regex expression to extract the two coordinates from the given column.

    output = []
    with open(DATA_CSV_NAME, 'rb') as csvfile:
        datareader = csv.reader(csvfile)
        for row in datareader:
            m = re.search('\((-?\d+\.\d+),\s(-?\d+\.\d+)\)', row[DESIRED_COLUMN])
            lat = m.group(1) if m else '0.0'
            long = m.group(2) if m else '0.0'
            row.append(lat)
            row.append(long)
            output.append(row)
    

    Here's an example of the regex at work:

    >>> m = re.search('\((-?\d+\.\d+),\s(-?\d+\.\d+)\)', '15206 PLANK RD\nBAKER, LA 70714\n(31.312663324000027, -92.44567750499994)')
    >>> m.group(0)
    '(31.312663324000027, -92.44567750499994)'
    >>> m.group(1)
    '31.312663324000027'
    >>> m.group(2)
    '-92.44567750499994'