Search code examples

How to convert convert one column in csv to two separate files: a shape file and an osm file in Python?

I have a .csv file that contains taxi trips, and one column is called trip_coordinates stored as strings, for example one trip coordinates would look like this (stored as string!):

[[40.7457407, -73.9781134], [40.7464087, -73.9797169], [40.7457353, -73.9801966], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7539937, -73.9741902], [40.753367, -73.974648], [40.754351, -73.9769749], [40.7547351, -73.9778672], [40.7554134, -73.9794895], [40.7547828, -73.9799429], [40.7451552, -73.9826672], [40.7457757, -73.9822189], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7552761, -73.9752669], [40.755903, -73.9748081], [40.756526, -73.974356], [40.7565994, -73.9745281], [40.7572359, -73.9760484], [40.7578582, -73.975593], [40.7584878, -73.9751336], [40.7591136, -73.9746825], [40.7597325, -73.974231], [40.7603711, -73.9737664], [40.7609986, -73.9733102]]

Using those coordinates I was able to create a LINESTRING and stored it back into the original .csv file in a columns called route_linestring by doing the following:

def convert_to_lineString(batch):
   batch_trips = pd.read_csv('batch.csv')

   for index, row in batch_trips.iterrows():
     if row['selected_distance'] != -100:
         temp = row['trip_route'].split(',')
         pnts_array = []
         for item in range(0,len(temp)):
            if item % 2 == 0:
                # string manipulation to extract points
                x = temp[item].replace('[','')
                y = temp[item+1].replace(']','')
                pnt = Point(float(x), float(y))
         line = LineString(pnts_array)
         print('line:', line)             [index, 'route_linestring'] = line


convert_to_lineString(1, 1)

The above array or coordinates would look like this now:

LINESTRING (40.7457407 -73.9781134, 40.7464087 -73.9797169, 40.7457353 -73.9801966, 40.7463887 -73.9817513, 40.7508351 -73.9785736, 40.7509627 -73.9785244, 40.7521935 -73.9776193, 40.7546355 -73.9757004, 40.7539937 -73.9741902, 40.753367 -73.974648, 40.754351 -73.9769749, 40.7547351 -73.9778672, 40.7554134 -73.9794895, 40.7547828 -73.9799429, 40.7451552 -73.9826672, 40.7457757 -73.9822189, 40.7463887 -73.9817513, 40.7508351 -73.9785736, 40.7509627 -73.9785244, 40.7521935 -73.9776193, 40.7546355 -73.9757004, 40.7552761 -73.9752669, 40.755903 -73.9748081, 40.756526 -73.974356, 40.7565994 -73.9745281, 40.7572359 -73.9760484, 40.7578582 -73.975593, 40.7584878 -73.9751336, 40.7591136 -73.9746825, 40.7597325 -73.974231, 40.7603711 -73.9737664, 40.7609986 -73.9733102)

I need help to save the column route_linestring in a separate shape file as well as a separate .osm file please?


  • I would approach this problem by reading the csv as a geopandas.GeoDataFrame and use a combination of json.loads and shapely.LineString to convert the string coordinates to a geometry. Then you can use .to_file to save the geodataframe as a shapefile. Finally, I would use ogr2osm to create the osm file from the newly created shapefile.


    feature 1,"[[40.7457407, -73.9781134], [40.7464087, -73.9797169], [40.7457353, -73.9801966], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7539937, -73.9741902], [40.753367, -73.974648], [40.754351, -73.9769749], [40.7547351, -73.9778672], [40.7554134, -73.9794895], [40.7547828, -73.9799429], [40.7451552, -73.9826672], [40.7457757, -73.9822189], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7552761, -73.9752669], [40.755903, -73.9748081], [40.756526, -73.974356], [40.7565994, -73.9745281], [40.7572359, -73.9760484], [40.7578582, -73.975593], [40.7584878, -73.9751336], [40.7591136, -73.9746825], [40.7597325, -73.974231], [40.7603711, -73.9737664], [40.7609986, -73.9733102]]"


    import json
    import geopandas as gpd
    import ogr2osm
    from shapely import LineString
    # Load csv as GeoDataFrame
    df = gpd.read_file('example.csv')
    # Convert coordinate string to geometry
    df.geometry = df.trip_route.apply(lambda x: LineString(json.loads(x)))
    # Export to shapefile
    # Use ogr2osm to convert shapefile to osm file
    translation_object = ogr2osm.TranslationBase()
    datasource = ogr2osm.OgrDatasource(translation_object)
    osmdata = ogr2osm.OsmData(translation_object)
    datawriter = ogr2osm.OsmDataWriter('example.osm')