Optimizing a raster::calc function - function 1 vs 2 - R

I am working on calculating a new raster (output ras) based on 2 rasters (input ras) and a 'stratum' raster. The Stratum raster values (1 to 4) refer to the rows in the bias and weight dataframes. Strata value '4' was used to fill any 'NA' in the Strata raster, otherwise the function would crash. The following input is required.

# load library
library(raster)

# reproducing the bias and weight data.frames
bias <- data.frame(
 ras_1 = c(56,-7,-30,0),
 ras_2 = c(29,18,-52,0),
 ras_3 = c(44,4,-15,0)
)
rownames(bias) <- c("Strat 1","Strat 2","Strat 3","Strat 4") 

weight <- data.frame(
 ras_1 = c(0.56,0.66,0.23,0.33),
 ras_2 = c(0.03,0.18,0.5,0.33),
 ras_3 = c(0.41,0.16,0.22,0.34)
)
rownames(weight) <- c("Strat 1","Strat 2","Strat 3","Strat 4")

The following function (fusion) allows me to add a 'bias' value to the input rasters. After the bias has been added, the two corrected input raster cell values will be multiplied by a weight value, depending in which stratum they belong.

The result of the input 2 raster values will be summed and returned using 'calc'.

## Create raster data for input

# create 2 rasters
r1 <- raster(ncol=10,nrow=10)
r2 <- raster(ncol=10,nrow=10)
r1[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[1:2] <- NA # include NA in input maps for example purpose

# Create strata raster (4 strata's)
r3 <- raster(ncol=10,nrow=10)
r3[] <- sample(seq(from = 1, to = 4, by = 1), size = 100, replace = TRUE)

Strata.n <- 4 # number of strata values in this example

fusion <- function(x) {
    result <- matrix(NA, dim(x)[1], 1)
    for (n in 1:Strata.n) {
    ok <- !is.na(x[,3]) &  x[,3] == n 
    a <- x[ok,1] + bias[n,1] # add bias to first input raster value             
    b <- x[ok,2] + bias[n,2] # add bias to second input raster value
    result[ok] <- a * weight[n,1] + b * weight[n,2] # Multiply values by weight
  }
  return(result)
}

s <- stack(r1,r2,r3)
Fused.map <- calc(s, fun = fusion, progress = 'text')

The problem with the above function is that:

It is only suited for 2 rasters

If one raster has NA, then the result will be NA for that cell

is.na(Fused.map@data@values) # check for NA in the fused map

What I would like to have is:

A function that takes any number of input rasters
It can work with NA values (ignores NA values in the rasters)
Re-adjusts the 'weight' if a raster has a NA value, so that the remaining weight values add up to 1

EDIT

The following function does what I need, but is significantly slower than the function above on large rasters. Fusion does it in 10 seconds, fusion2 function below needs 8 hours on large rasters...

fusion2 <- function(x) {
  m <- matrix(x, nrow= 1, ncol=3) # Create matrix per stack of cells
  n <- m[,3] # get the stratum
  g <- m[1:(Strata.n-1)] + as.matrix(bias[n,]) # add bias to raster values
  g[g < 0] <- 0 # set values below 0 to 0
  w <- weight[n,1:(Strata.n-1)] # get correct strata weight values
  w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
  p <- sum(w, na.rm = T) # calculate sum of weight values
  pp <- w/p # divide weight values by sum to get the proportion to == 1
  pp <- as.numeric(pp)
  result <- as.integer(round(sum(pp*g, na.rm = T))) # return raster value
  return(result)
}

Fused.map <- calc(s, fun = fusion2, progress = 'text')

Any way to optimize the fusion2 function to a similar method as fusion1?

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Thank you for your time!

Solution

There seems to be a lot of unnecessary format conversions going on, and using the simplest data structures available is the fastest. calc parameter is a numeric vector, so you can use numeric vectors everywhere. Also, rounding and casting into an integer is redundant.

fusion3 <- function(x) {
  n <- x[3] # get the stratum
  g <- x[1:(Strata.n-1)] + as.numeric(bias[n,]) # add bias to raster values
  g[g < 0] <- 0 # set values below 0 to 0
  w <- as.numeric(weight[n,1:(Strata.n-1)]) # get correct strata weight values
  w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
  p <- sum(w, na.rm = T) # calculate sum of weight values
  pp <- w/p # divide weight values by sum to get the proportion to == 1
  result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
  return(result)
}

On a 100x100 raster, your original functions take:

system.time(Fused.map <- calc(s, fun = fusion, progress = 'text'))
   user  system elapsed 
  0.015   0.000   0.015 
system.time(Fused.map <- calc(s, fun = fusion2, progress = 'text'))
   user  system elapsed 
  8.270   0.078   8.312

The modified function is already 5 times faster:

system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
   user  system elapsed 
  1.970   0.026   1.987

Next, precompute matrices from the data frames so you don't need to do that for each pixel:

bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)

fusion3 <- function(x) {
  n <- x[3] # get the stratum
  g <- x[1:(Strata.n-1)] + bias_matrix[n,] # add bias to raster values
  g[g < 0] <- 0 # set values below 0 to 0
  w <- weight_matrix[n,1:(Strata.n-1)] # get correct strata weight values
  w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
  p <- sum(w, na.rm = T) # calculate sum of weight values
  pp <- w/p # divide weight values by sum to get the proportion to == 1
  result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
  return(result)
}

We get:

system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
  user  system elapsed 
 0.312   0.008   0.318

And finally, also precompute 1:(Strata.n-1):

bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)
Strata.minus1 = 1:(Strata.n-1)

fusion3 <- function(x) {
  n <- x[3] # get the stratum
  g <- x[Strata.minus1] + bias_matrix[n,] # add bias to raster values
  g[g < 0] <- 0 # set values below 0 to 0
  w <- weight_matrix[n,Strata.minus1] # get correct strata weight values
  w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
  p <- sum(w, na.rm = T) # calculate sum of weight values
  pp <- w/p # divide weight values by sum to get the proportion to == 1
  result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
  return(result)
}

We get:

system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
   user  system elapsed 
  0.252   0.011   0.262

That's not quite 0.015 yet, but you also have to take into consideration that your original function does not output integers, nor does it set values below 0 to 0, nor does it make the proportions sum to 1, nor as you mentioned deal with NAs.

Mind you, this function still only works with only two rasters, because you hardcode stratum as layer 3. You should instead use raster::overlay with two parameters, the stratum raster and the layers themselves (or use calc with the stratum raster as layer 1, but that's not what calc is designed for).