Search code examples
pysparkgeospatialshapefilepalantir-foundry

How to parse a shape file to use in Foundry's map application?


I am ingesting data in the form of a shapefile. For example, ice data from https://usicecenter.gov/Products

How do I use these files in Foundry, in particular displaying on a map?


Solution

  • Easy! This is outlined in the documentation on using vector data in transforms

    Clean geospatial data in Foundry is:

    1. Tabular, so the data can be used in Spark transforms
    2. Formatted as either a valid GeoJSON or geohash, so Geospatial data can be used in the Foundry Ontology
    3. Projected using the EPSG:4326 CRS, so that both sides of spatial joins use the same projection and Foundry maps will render features correctly.

    Foundry provides a geospatial-tools pyspark library which makes it easy to clean and convert. Further details are in the documentation for data parsing and cleaning, but for this specific example, we would need to convert the shapefile into a dataframe and then project to EPSG:4326.

    The EPSG can be determined from the .prj file, using the method outlined here. For the example of the ice shapefiles:

    with open(shapeprj_path, 'r') as f:
            prj_txt = f.read()
        srs = osr.SpatialReference()
        srs.ImportFromESRI([prj_txt])
        print(str(srs.ExportToProj4()))
    

    The output is: +proj=lcc +lat_0=40 +lon_0=-100 +lat_1=49 +lat_2=77 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs

    This is used as the input_crs:

    from transforms.api import transform, Input, Output
    from geospatial_tools import geospatial
    from geospatial_tools.parsers import shapefile_to_dataframe
    from geospatial_tools.geom_transformations import normalize_projection
    
    
    @geospatial()
    @transform(
        output=Output("path/to/ice_data_parsed"),
        raw=Input("path/to/ice_data_raw"),
    )
    def compute(raw, output):
        gdf = shapefile_to_dataframe(raw)
        gdf = normalize_projection(input_df=gdf, geometry_column="geometry", input_crs="+proj=lcc +lat_0=40 +lon_0=-100 +lat_1=49 +lat_2=77 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs")
    output.write_dataframe(gdf)
    

    The output dataset can then be synced to the Ontology and used in the mapping applications