Search code examples
scalaapache-sparkshapefile

Reading shapefiles data with spark


I have just started using ESRI API to process GIS data (shapefiles), I am also using this code (https://github.com/mraad/spark-shp) to read the data using spark (running scala code).

My question is, how can we extract polygons from a shapefile, since I can't find it anywhere in the documentation When I read the Shapefile I get only an RDD with Points not Polygons


Solution

  • It's probable worth raising an issue on the GitHub project in question with your query. The maintainers are the most suited to answering your question, and it can help them improve their documentation.

    From what I can see, you should be getting a DataFrame rather than RDD. So perhaps you're using a lower level method than is expected. This extension method on DataFrameReader suggests to me that you should be using the shp method to load your files and get back a DataFrame. Within the same file there is also another shp extension method that appears in that projects tests showing loading of a file like so:

    val results = sparkSession
      .sqlContext
       .shp(path)
    

    You'll need to import the package/implicits to access these methods.