Search code examples
pysparkgeospatialgeopandaspyprojpandas-udf

Geopandas convert crs


I have a created a geopandas dataframe with 50 million records which contain Latitude Longitude in CRS 3857 and I want to convert to 4326. Since the dataset is huge the geopandas unable to convert this.how i can execute this in distributed manner.

    df = sdf.toPandas()
    gdf = gpd.GeoDataFrame(
    df.drop(['Longitude', 'Latitude'], axis=1),
    crs={'init': 'epsg:4326'},
    geometry=[Point(xy) for xy in zip(df.Longitude, df.Latitude)])
    return gdf

result_gdf=convert_crs(grid_df)

Solution

  • See: https://github.com/geopandas/geopandas/issues/1400

    This is very fast and memory efficient:

    from pyproj import Transformer
    
    trans = Transformer.from_crs(
        "EPSG:4326",
        "EPSG:3857",
        always_xy=True,
    )
    xx, yy = trans.transform(df["Longitude"].values, df["Latitude"].values)
    df["X"] = xx
    df["Y"] = yy