statistics classification polygon raster qgis

Calculate raster landscape proportion (percentage) within multiple overlaping polygon (shapefiles)?

I think the easiest way is to extract the raster values within each polygon and calculate the proportion. Is it possible to do so without reading the entire grid as an array?

I have 23 yearly global classified raster (resolution = 0.00277778 degree) from 1992 - 2015 and a polygon vector with 354 shapes (which overlap at some parts). Because of the overlap (Self-intersection) it is not easy to work with them as raster. Both projected in "+proj=longlat +datum=WGS84 +no_defs".

The raster consists of classes from 10 - 220 The polygon has ABC_ID from 1 - 449

For one Year it looks like: classification and shape example

I need to create a table like:

example table

I already tried to achieve this with:

Zonal Statistics
Pk tools (extract vector sample from raster)
LecoS (Overlay raster metrics)
Cross-Classification and Tabulation" of SAGA GIS (problems with extent)
FRAGSTATS (i was not able to load in the shp file)
Raster --> Extraction --> Clipper dose not work (Ring Self-intersection)

I have heard that Tabulate Area from ArcMap can do this but it would be nice if there is an open source solution to this.

Solution

I have managed to do it with Python "rasterio" and "geopandas"

It now creates a table like: example result

since i did not found something similar like the extract comand in R "raster" it took more than only 2 lines but instead of calculating half the night it now takes only 2 min for one year. The results are the same. It is based on the ideas of "gene" from "https://gis.stackexchange.com/questions/260304/extract-raster-values-within-shapefile-with-pygeoprocessing-or-gdal/260380"

import rasterio
from rasterio.mask import mask
import geopandas as gpd
import pandas as pd

print('1. Read shapefile')

shape_fn = "D:/path/path/multypoly.shp"
raster_fn = "D:/path/path/class_1992.tif"

# set max and min class
raster_min = 10
raster_max = 230

output_dir = 'C:/Temp/'

write_zero_frequencies = True
show_plot = False

shapefile = gpd.read_file(shape_fn)

# extract the geometries in GeoJSON format
geoms = shapefile.geometry.values # list of shapely geometries
records = shapefile.values

with rasterio.open(raster_fn) as src:

     print('nodata value:', src.nodata)

     idx_area = 0
     # for upslope_area in geoms:
     for index, row in shapefile.iterrows():

          upslope_area = row['geometry']
          lake_id = row['ABC_ID']
          print('\n', idx_area, lake_id, '\n')

          # transform to GeJSON format
          from shapely.geometry import mapping
          mapped_geom = [mapping(upslope_area)]

          print('2. Cropping raster values')
          # extract the raster values values within the polygon
          out_image, out_transform = mask(src, mapped_geom, crop=True)

          # no data values of the original raster
          no_data=src.nodata

          # extract the values of the masked array
          data = out_image.data[0]

          # extract the row, columns of the valid values
          import numpy as np
          # row, col = np.where(data != no_data)
          clas = np.extract(data != no_data, data)

          # from rasterio import Affine # or from affine import Affine
          # T1 = out_transform * Affine.translation(0.5, 0.5) # reference the pixel centre
          # rc2xy = lambda r, c: (c, r) * T1

          # d = gpd.GeoDataFrame({'col':col,'row':row,'clas':clas})

          range_min = raster_min    # min(clas)
          range_max = raster_max    # max(clas)

          classes = range(range_min, range_max + 2)

          frequencies, class_limits = np.histogram(clas,
                                                   bins=classes,
                                                   range=[range_min, range_max])

          if idx_area == 0:
               # data_frame = gpd.GeoDataFrame({'freq_' + str(lake_id):frequencies})
               data_frame = pd.DataFrame({'freq_' + str(lake_id): frequencies})
               data_frame.index = class_limits[:-1]
          else:
               data_frame['freq_' + str(lake_id)] = frequencies
      idx_area += 1

 print(data_frame)
 data_frame.to_csv(output_dir + 'upslope_area_1992.csv', sep='\t')