apologies for cross-posting something similar in the GIS stack.
I am looking for a more efficient way to create a frequency table based on a large raster in R.
Currently, I have a few dozen rasters, ~ 150 million cells in each, and I need to create frequencies tables for each. These rasters are derived from masking a base raster with a few hundred small sampling locations*. Therefore the rasters I am creating the tables from contain ~99% NA values.
My current working approach is this:
sampling_site_raster <- raster("FILE")
base_raster <- raster("FILE")
sample_raster <- mask(base_raster, sampling_site_raster)
DF <- as.data.frame(freq(sample_raster, useNA='no', progress='text'))
### run time for the freq() process ###
user system elapsed
162.60 4.85 168.40
this uses the freq()
function from the raster package of R. The usaNA=no
flag will dump the NA values.
My questions are:
1) is there a more efficient way to create a frequency table from a large raster that is 99% NA values?
or
2) is the a more efficient way to derive the values from the base raster than by using mask()
? (using the Mask GP function in ArcGIS is very fast, but still has the NA values and is an extra step
*additional info: The sample areas represented by sampling_site_raster
are irregular shapes of various sizes spread randomly across the study area. In the sampling_site_raster
the sampling sites are encoded as 1
and non-sampling areas as NA
.
Thank you!
If you mask the raster by raster, you will always get another huge raster. I don't think this is a way to make things faster.
What I would do is to try to mask by polygon layer using extract
:
res <- extract(raster, polygons)
Then you will have all the cell values for each polygon and can run freq
on them.