Search code examples
rrasterr-raster

raster calc function to find max index (i.e. most recent layer) meeting criterion


For each cell of a rasterStack, I want to find the most recent layer where the value exceeds a fixed threshold. Layers are stacked in chronological order, so this corresponds to the maximum index. Ultimately I want to know 1) the year of that layer (taken from the layer name), and 2) the value in that year.

I wrote a function to do this and was getting the wrong result. I modified the function in a way that I did not think would change it; I now get the correct result. My question is why these functions produce something different.

Set up the example:

library(raster)
library(rasterVis)

###   Example raster stack: 
set.seed(123123)
r1 <- raster(nrows = 10, ncols = 10)
r2 <- r3 <- r4 <- r1
r1[] <- rbinom(ncell(r1), 1, prob = .1)
r2[] <- rbinom(ncell(r1), 1, prob = .1)
r3[] <- rbinom(ncell(r1), 1, prob = .1)
r4[] <- rbinom(ncell(r1), 1, prob = .1)
rs <- stack(r1, r2, r3, r4)
names(rs) <- paste0("yr", 1:4)

These are the functions I would have thought would be the same...is there a reason not to have a vector as an intermediate step?

###  Function to find index of last event:
# v1; this is wrong
findLast <- function(x) {
  fire.ind <- ifelse(any(x > 0), which(x > 0), NA)
  max.ind <- max(fire.ind)
}

# v2; this one gives correct answer
findLast2 <- function(x) {
  max.ind <- ifelse(any(x > 0), max(which(x > 0)), NA)
}

testFind <- calc(rs, findLast)
freq(testFind)

testFind2 <- calc(rs, findLast2)
all.equal(testFind, testFind2)

Show the example inputs and the different results:

# plot:
s2 = stack(rs, testFind, testFind2)
levelplot(s2, pretty = TRUE)

inputs and difference in outputs between functions

Code to get the final layers I want:

###  Most recent year:
nameFromInd <- function(x) {
  yr <- as.integer(gsub(".*(\\d.*).*", "\\1", names(rs)[x]))
}
testYr <- calc(testFind2, nameFromInd)

###  Value in most recent year:
testYrValue <- stackSelect(rs, testFind2)

Any insight into what I'm not seeing here? I haven't played around with alternatives that enhance speed, but any suggestions are welcome as I'll be doing this on a very large dataset.

sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Solution

  • ifelse can be a bit surprising in that it returns a value with the same "shape" as the first argument, which has length one. So it returns the first value only of the result of which(x > 0). Always check the functions you use before using them with calc.

    x <- c(-1:2, 1:-1)
    ifelse(any(x > 0), which(x > 0), NA)
    #[1] 3
    ifelse(any(x > 0), max(which(x > 0)), NA)
    #[1] 5
    

    ifelse is a pretty involved function, and I think you can avoid it by doing:

    r <- calc(rs, function(x) max(which(x > 0)))
    y <- stackSelect(rs, r)
    

    (and ignore the warnings). Or suppress them:

    options(warn=-1)
    r <- calc(rs, function(x) max(which(x > 0)))
    options(warn=0)
    testYrValue <- stackSelect(rs, r)
    

    You can also combine what you do with calc and stackSelect like this

    f1 <- function(x) {
        if (any(x>0)) {
            i <- max(which(x > 0))
            cbind(i, x[i])
        } else {
            cbind(NA, NA)
        }
    }
    
    rr1 <- calc(rs, f1)
    

    Or the shortcut variant:

    f2 <- function(x) {
        i <- max(which(x > 0))
        cbind(i, x[i])
    }
    
    rr2 <- calc(rs, f2)
    

    Or

    f3 <- function(x) {
        i <- max(which(x > 0))
        z <- cbind(i, x[i])
        z[!is.finite(z)] <- NA
        z
    }
    
    rr3 <- calc(rs, f3)