I have a series of .csv files with coordinate info in a specific column. Problem is, the coordinate info has address info in the same column, and not always in the same format.
15206 PLANK RD
BAKER, LA 70714
(30.586316235000027, -91.12488045299995)
or
LA
(31.312663324000027, -92.44567750499994)
So what I need to do is pull just the lat data into a new column, and the long data into a new column. I'm sure that this is a pretty simple straight forward process, I'm just not sure how to tackle this. I'd like to have a python script that could do this for me where I specify the column to search, and have it pull just the coordinate info into new columns in the same rows it pulls that data from.
Here's a little snippet which can extract long/lat fields from a desired column. It uses a regex expression to extract the two coordinates from the given column.
output = []
with open(DATA_CSV_NAME, 'rb') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
m = re.search('\((-?\d+\.\d+),\s(-?\d+\.\d+)\)', row[DESIRED_COLUMN])
lat = m.group(1) if m else '0.0'
long = m.group(2) if m else '0.0'
row.append(lat)
row.append(long)
output.append(row)
Here's an example of the regex at work:
>>> m = re.search('\((-?\d+\.\d+),\s(-?\d+\.\d+)\)', '15206 PLANK RD\nBAKER, LA 70714\n(31.312663324000027, -92.44567750499994)')
>>> m.group(0)
'(31.312663324000027, -92.44567750499994)'
>>> m.group(1)
'31.312663324000027'
>>> m.group(2)
'-92.44567750499994'