Search code examples
statisticsclassificationpolygonrasterqgis

Calculate raster landscape proportion (percentage) within multiple overlaping polygon (shapefiles)?


I think the easiest way is to extract the raster values within each polygon and calculate the proportion. Is it possible to do so without reading the entire grid as an array?


I have 23 yearly global classified raster (resolution = 0.00277778 degree) from 1992 - 2015 and a polygon vector with 354 shapes (which overlap at some parts). Because of the overlap (Self-intersection) it is not easy to work with them as raster. Both projected in "+proj=longlat +datum=WGS84 +no_defs".

The raster consists of classes from 10 - 220 The polygon has ABC_ID from 1 - 449

For one Year it looks like: classification and shape example


I need to create a table like:

example table


I already tried to achieve this with:

  1. Zonal Statistics
  2. Pk tools (extract vector sample from raster)
  3. LecoS (Overlay raster metrics)
  4. Cross-Classification and Tabulation" of SAGA GIS (problems with extent)
  5. FRAGSTATS (i was not able to load in the shp file)
  6. Raster --> Extraction --> Clipper dose not work (Ring Self-intersection)

I have heard that Tabulate Area from ArcMap can do this but it would be nice if there is an open source solution to this.


Solution

  • I have managed to do it with Python "rasterio" and "geopandas"

    It now creates a table like: example result

    since i did not found something similar like the extract comand in R "raster" it took more than only 2 lines but instead of calculating half the night it now takes only 2 min for one year. The results are the same. It is based on the ideas of "gene" from "https://gis.stackexchange.com/questions/260304/extract-raster-values-within-shapefile-with-pygeoprocessing-or-gdal/260380"

    import rasterio
    from rasterio.mask import mask
    import geopandas as gpd
    import pandas as pd
    
    print('1. Read shapefile')
    
    shape_fn = "D:/path/path/multypoly.shp"
    raster_fn = "D:/path/path/class_1992.tif"
    
    # set max and min class
    raster_min = 10
    raster_max = 230
    
    output_dir = 'C:/Temp/'
    
    write_zero_frequencies = True
    show_plot = False
    
    shapefile = gpd.read_file(shape_fn)
    
    # extract the geometries in GeoJSON format
    geoms = shapefile.geometry.values # list of shapely geometries
    records = shapefile.values
    
    with rasterio.open(raster_fn) as src:
    
         print('nodata value:', src.nodata)
    
         idx_area = 0
         # for upslope_area in geoms:
         for index, row in shapefile.iterrows():
    
              upslope_area = row['geometry']
              lake_id = row['ABC_ID']
              print('\n', idx_area, lake_id, '\n')
    
              # transform to GeJSON format
              from shapely.geometry import mapping
              mapped_geom = [mapping(upslope_area)]
    
              print('2. Cropping raster values')
              # extract the raster values values within the polygon
              out_image, out_transform = mask(src, mapped_geom, crop=True)
    
              # no data values of the original raster
              no_data=src.nodata
    
              # extract the values of the masked array
              data = out_image.data[0]
    
              # extract the row, columns of the valid values
              import numpy as np
              # row, col = np.where(data != no_data)
              clas = np.extract(data != no_data, data)
    
              # from rasterio import Affine # or from affine import Affine
              # T1 = out_transform * Affine.translation(0.5, 0.5) # reference the pixel centre
              # rc2xy = lambda r, c: (c, r) * T1
    
              # d = gpd.GeoDataFrame({'col':col,'row':row,'clas':clas})
    
              range_min = raster_min    # min(clas)
              range_max = raster_max    # max(clas)
    
              classes = range(range_min, range_max + 2)
    
              frequencies, class_limits = np.histogram(clas,
                                                       bins=classes,
                                                       range=[range_min, range_max])
    
              if idx_area == 0:
                   # data_frame = gpd.GeoDataFrame({'freq_' + str(lake_id):frequencies})
                   data_frame = pd.DataFrame({'freq_' + str(lake_id): frequencies})
                   data_frame.index = class_limits[:-1]
              else:
                   data_frame['freq_' + str(lake_id)] = frequencies
          idx_area += 1
    
     print(data_frame)
     data_frame.to_csv(output_dir + 'upslope_area_1992.csv', sep='\t')