I think the easiest way is to extract the raster values within each polygon and calculate the proportion. Is it possible to do so without reading the entire grid as an array?
I have 23 yearly global classified raster (resolution = 0.00277778 degree) from 1992 - 2015 and a polygon vector with 354 shapes (which overlap at some parts). Because of the overlap (Self-intersection) it is not easy to work with them as raster. Both projected in "+proj=longlat +datum=WGS84 +no_defs".
The raster consists of classes from 10 - 220 The polygon has ABC_ID from 1 - 449
For one Year it looks like:
I need to create a table like:
I already tried to achieve this with:
I have heard that Tabulate Area from ArcMap can do this but it would be nice if there is an open source solution to this.
I have managed to do it with Python "rasterio" and "geopandas"
It now creates a table like: example result
since i did not found something similar like the extract comand in R "raster" it took more than only 2 lines but instead of calculating half the night it now takes only 2 min for one year. The results are the same. It is based on the ideas of "gene" from "https://gis.stackexchange.com/questions/260304/extract-raster-values-within-shapefile-with-pygeoprocessing-or-gdal/260380"
import rasterio
from rasterio.mask import mask
import geopandas as gpd
import pandas as pd
print('1. Read shapefile')
shape_fn = "D:/path/path/multypoly.shp"
raster_fn = "D:/path/path/class_1992.tif"
# set max and min class
raster_min = 10
raster_max = 230
output_dir = 'C:/Temp/'
write_zero_frequencies = True
show_plot = False
shapefile = gpd.read_file(shape_fn)
# extract the geometries in GeoJSON format
geoms = shapefile.geometry.values # list of shapely geometries
records = shapefile.values
with rasterio.open(raster_fn) as src:
print('nodata value:', src.nodata)
idx_area = 0
# for upslope_area in geoms:
for index, row in shapefile.iterrows():
upslope_area = row['geometry']
lake_id = row['ABC_ID']
print('\n', idx_area, lake_id, '\n')
# transform to GeJSON format
from shapely.geometry import mapping
mapped_geom = [mapping(upslope_area)]
print('2. Cropping raster values')
# extract the raster values values within the polygon
out_image, out_transform = mask(src, mapped_geom, crop=True)
# no data values of the original raster
no_data=src.nodata
# extract the values of the masked array
data = out_image.data[0]
# extract the row, columns of the valid values
import numpy as np
# row, col = np.where(data != no_data)
clas = np.extract(data != no_data, data)
# from rasterio import Affine # or from affine import Affine
# T1 = out_transform * Affine.translation(0.5, 0.5) # reference the pixel centre
# rc2xy = lambda r, c: (c, r) * T1
# d = gpd.GeoDataFrame({'col':col,'row':row,'clas':clas})
range_min = raster_min # min(clas)
range_max = raster_max # max(clas)
classes = range(range_min, range_max + 2)
frequencies, class_limits = np.histogram(clas,
bins=classes,
range=[range_min, range_max])
if idx_area == 0:
# data_frame = gpd.GeoDataFrame({'freq_' + str(lake_id):frequencies})
data_frame = pd.DataFrame({'freq_' + str(lake_id): frequencies})
data_frame.index = class_limits[:-1]
else:
data_frame['freq_' + str(lake_id)] = frequencies
idx_area += 1
print(data_frame)
data_frame.to_csv(output_dir + 'upslope_area_1992.csv', sep='\t')