I have the foll. dataframe:
col_a col_b col_c lat lon polyline
0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
I would like to convert it into a geopandas dataframe (with geometry information from polyline), but the polyline column is not in a standard format. How to fix this?
IIUC, if the original dataframe is a Pandas dataframe, then you can try using Series.str.translate to remove all double quotes and use Series.str.findall to retrieve all lat-long pairs into a list of tuples and then assign coordinates to create the Polygon(notice we use map(float,x)
to convert lat/long from str to float):
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
df['coords'] = df.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]
gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)
Edit: if the methods under pandas.Series.str
are not available, you can do the same using Python re module, for example: (assume the original dataframe is a geodataframe named gdf)
import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)