raster calc function to find max index (i.e. most recent layer) meeting criterion

For each cell of a rasterStack, I want to find the most recent layer where the value exceeds a fixed threshold. Layers are stacked in chronological order, so this corresponds to the maximum index. Ultimately I want to know 1) the year of that layer (taken from the layer name), and 2) the value in that year.

I wrote a function to do this and was getting the wrong result. I modified the function in a way that I did not think would change it; I now get the correct result. My question is why these functions produce something different.

Set up the example:

library(raster)
library(rasterVis)

###   Example raster stack: 
set.seed(123123)
r1 <- raster(nrows = 10, ncols = 10)
r2 <- r3 <- r4 <- r1
r1[] <- rbinom(ncell(r1), 1, prob = .1)
r2[] <- rbinom(ncell(r1), 1, prob = .1)
r3[] <- rbinom(ncell(r1), 1, prob = .1)
r4[] <- rbinom(ncell(r1), 1, prob = .1)
rs <- stack(r1, r2, r3, r4)
names(rs) <- paste0("yr", 1:4)

These are the functions I would have thought would be the same...is there a reason not to have a vector as an intermediate step?

###  Function to find index of last event:
# v1; this is wrong
findLast <- function(x) {
  fire.ind <- ifelse(any(x > 0), which(x > 0), NA)
  max.ind <- max(fire.ind)
}

# v2; this one gives correct answer
findLast2 <- function(x) {
  max.ind <- ifelse(any(x > 0), max(which(x > 0)), NA)
}

testFind <- calc(rs, findLast)
freq(testFind)

testFind2 <- calc(rs, findLast2)
all.equal(testFind, testFind2)

Show the example inputs and the different results:

# plot:
s2 = stack(rs, testFind, testFind2)
levelplot(s2, pretty = TRUE)

inputs and difference in outputs between functions

Code to get the final layers I want:

###  Most recent year:
nameFromInd <- function(x) {
  yr <- as.integer(gsub(".*(\\d.*).*", "\\1", names(rs)[x]))
}
testYr <- calc(testFind2, nameFromInd)

###  Value in most recent year:
testYrValue <- stackSelect(rs, testFind2)

Any insight into what I'm not seeing here? I haven't played around with alternatives that enhance speed, but any suggestions are welcome as I'll be doing this on a very large dataset.

sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Solution

ifelse can be a bit surprising in that it returns a value with the same "shape" as the first argument, which has length one. So it returns the first value only of the result of which(x > 0). Always check the functions you use before using them with calc.

x <- c(-1:2, 1:-1)
ifelse(any(x > 0), which(x > 0), NA)
#[1] 3
ifelse(any(x > 0), max(which(x > 0)), NA)
#[1] 5

ifelse is a pretty involved function, and I think you can avoid it by doing:

r <- calc(rs, function(x) max(which(x > 0)))
y <- stackSelect(rs, r)

(and ignore the warnings). Or suppress them:

options(warn=-1)
r <- calc(rs, function(x) max(which(x > 0)))
options(warn=0)
testYrValue <- stackSelect(rs, r)

You can also combine what you do with calc and stackSelect like this

f1 <- function(x) {
    if (any(x>0)) {
        i <- max(which(x > 0))
        cbind(i, x[i])
    } else {
        cbind(NA, NA)
    }
}

rr1 <- calc(rs, f1)

Or the shortcut variant:

f2 <- function(x) {
    i <- max(which(x > 0))
    cbind(i, x[i])
}

rr2 <- calc(rs, f2)

f3 <- function(x) {
    i <- max(which(x > 0))
    z <- cbind(i, x[i])
    z[!is.finite(z)] <- NA
    z
}

rr3 <- calc(rs, f3)