Search code examples
rperformancerasterr-sf

Avoid a for loop in raster::extract(rst,shp)


I am working on R to extract the mean and maximum value of a raster within a 3 meter buffer of some buildings.

For this, I have created a for loop that iterates iterates through each building to extract this two values. My current code looks as follows:

for (b in c(1:nrow(buildings_shp))){
    
    building <- buildings_shp[b,]
    
    buffered <- st_buffer(building, 3)
    
    raster_cropped <- crop(raster, extent(buffered))

    mean <- extract(depths_cropped, buffered, fun = mean, na.rm = TRUE)
    max <- extract(depths_cropped, buffered, fun = max, na.rm = TRUE)
    
    buildings_shp[b,"mean"] <- mean
    buildings_shp[b,"max"] <- max
    
  }

This loop, however, takes a considerable amount of time (~17 minutes for 1500 buildings), and the step that seems to take most time is the two extract lines. I would like to know if there are ways to speed up this process by:

a) avoiding the use of a loop - the reason for this loop is that I fear that if I use st_buffer on the entire dataset, then when buildings are closer than 3 meters I would generate overlapping geometries, which may cause an error.

b) parallelizing the for loop (i have tried the raster clustering feature, but it did not speed up the process, probably because it did not parallelize the loop itself but the extract function)

c) using other function than raster::extract. I have seen some posts recommending the velox package, but it seems like this package has been removed from CRAN.

Some dummy data (copied from the referenced question above)

library(raster)
library(sf)

raster <- raster(ncol=1000, nrow=1000, xmn=2001476, xmx=11519096, ymn=9087279, ymx=17080719)
raster []=rtruncnorm(n=ncell(raster ),a=0, b=10, mean=5, sd=2)
crs(raster ) <- "+proj=utm +zone=51 ellps=WGS84"
    
x1 <- runif(100,2001476,11519096)
y1 <- runif(100, 9087279,17080719)

buildings_shp <- st_buffer(st_sfc(st_point(c(x1[1],y1[1]), dim="XY"),crs=32651),200000)

Solution

  • You do not need a loop. Example data from ?raster::extract

    library(raster)
    r <- raster(ncol=36, nrow=18, vals=1:(18*36))
    cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
    cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
    buildings <- spPolygons(cds1, cds2)
    

    Get the buffers and extract. As you want to compute two statistics, it is easier to not use a summarizing function in this case.

    b <- buffer(buildings, width=3, dissolve=FALSE)
    e <- extract(r, b)
    

    And now compute the statistics

    sapply(e, mean, na.rm=TRUE)
    #[1] 379.4167 330.0741
    sapply(e, max, na.rm=TRUE)
    #[1] 507 498
    

    This should be faster with terra

    library(terra)
    v <- vect(b)
    x <- rast(r)
    ee <- extract(x, v)