Search code examples
pythongoogle-cloud-storagegoogle-cloud-dataflowapache-beamshapefile

Read Shapefile from Google Cloud Storage using Dataflow + Beam + Python


How can one read Shapefile from Google Cloud Storage using Dataflow + Beam + Python.
I've found only beam.io.ReadFromText, but python shapefile reader demands file-like object: shp.Reader(shp=shp_file, dbf=dbf_file) or a shapefile.
I'm using Python 2.7.


Solution

  • This is the way to do it:

    prj_file =  beam.io.gcp.gcsio.GcsIO().open(
        filenamePRJ, 
        mode='r',
        read_buffer_size=1677721600, 
        mime_type='application/octet-stream'
    )
    
    shp_file = beam.io.gcp.gcsio.GcsIO().open(
        filenameSHP, 
        mode='r',
        read_buffer_size=1677721600,
        mime_type='application/octet-stream'
    )
    
    dbf_file =  beam.io.gcp.gcsio.GcsIO().open(
        filenameDBF,
        mode='r',
        read_buffer_size=1677721600,
        mime_type='application/octet-stream'
    )
    
    sf = shp.Reader(shp=shp_file, dbf=dbf_file)      
    euref  = osr.SpatialReference()
    euref.ImportFromWkt(str(prj_file.read()))
    wgs84 = osr.SpatialReference()
    wgs84.ImportFromEPSG(4326)
    transformation = osr.CoordinateTransformation(euref,wgs84)