Search code examples
pythonstatisticsrastergdal

How to do zonal statistics on raster file based on polyon in Python?


I have a raster file and a polygon shapefile. I like to get the mean of the raster file of the area the polygon covers. I like to do this in a python standalone script. So QGIS and Starspan do not work. Also Arcpy is not available. I like to use GDAL. What Python package? ways to do this can you recommend?


Solution

  • There is a gdal.RasterizeLayer function which let you rasterize a vector layer. It has some downsides, you need to have an output dataset to which you rasterize. In addition, if you have overlapping geometries, you want to first isolate each geometry on a seperate vector layer, meaning you have to loop over all geometries.

    With gdal you can create in-memory files by using the MEM driver, this makes it a bit easier, but there is still a lot of dataset-creation overhead.

    For each geometry, the steps would be more or less like:

    driver = gdal.GetDriverByName('MEM')
    outds = driver.Create('', pixelxsize, pixelysize, 1, GDT_Byte)
    outds.SetProjection(target_proj)
    outds.SetGeoTransform(target_gt)
    
    gdal.RasterizeLayer(outds, [1], vectorlayer, burn_values=[1])
    

    Now the outds contains a mask of the geometry, using it with for example np.masked_where you can isolate the pixels within the geometry.

    Its not as convenient as it good be, but once you have a masked array of the polygon, its very easy to get statistics by using numpy/scipy.

    edit: See this script for some more detailed examples: http://svn.osgeo.org/gdal/trunk/autotest/alg/rasterize.py