r regression linear-regression raster lm

Linear regression on raster images - lm complains about NAs

I'm sure this can be fixed with few bytes, but I've spent hours on this simple thing and can't get out of it. I don't use R often.

I have 5 asciigrid files that represent 5 raster images. Some pixels do have values, other do have NAs. For example, the first image might be something like:

NA  NA  NA  NA  NA
NA  NA  2   3   NA
NA  0.2 0.3 1   NA
NA  NA  4   NA  NA

and the second might be:

NA  NA  NA  NA  NA
NA  NA  5   1   NA
NA  0.1 12  12  NA
NA  NA  6   NA  NA

As you can see, NA position is always the same and I'm 100% sure about that. What I'm willing to do:

read the files with read.asciigrid();
get their values in a long array, using values() from the raster package;
create a matrix with 5 rows, each of them holding the values of the corresponding map;
linear fit each column and get the coefficients. Each column will represent a pixel, and will have 5 values corresponding to the 5 maps.
create two new raster images with the coefficient values.

I'm stuck at lm. Specifically, it says: Error in lm.fit(...): 0 (non-NA) cases. However, from what I know about the imput maps, there should be either columns with all NAs or columns with no NA at all, like as follows:

NA   NA   NA   NA   0.2  2    NA  ... (lots of other columns)
NA   NA   NA   NA   2    2.1  NA
NA   NA   NA   NA   3    0.5  NA
NA   NA   NA   NA   12   6    NA
NA   NA   NA   NA   0.4  2    NA

I'd expect the output to be:

NA   NA   NA   NA   ..   ..   NA

so I can create a new raster image with the coefficients and keep the NA position. Where am I wrong? Pasting my code below. Thank you.

library(sp)
library(raster)
library(fields)
names = c('...','...','...','...','...')
x = c(10,20,30,40,50)
x = log(x)
y = vector('list',length=length(x))
rasters = vector('list',length=length(x))
for (name in names) {
  ind = which(name == names)
  rasters[ind] = read.asciigrid(name)
  rasters[ind] = raster(rasters[[ind]])
  y[[ind]] = values(rasters[[ind]])
}

y = t(simplify2array(y))
lModel = lm(y ~ x) // Error here!

This is the output of str(y):

num [1:5, 1:1260630] NA NA NA NA NA NA NA NA NA NA ... (at some point there will be numbers here)

Edit

Thanks to @RobertH I learned about raster::stack and raster::calc. I have tried:

x <- log(c(10,20,30,40,50))
fun <- function(y) { lm(y ~ x)$coefficients }
r <- calc(s, fun)

Getting an obscure Cannot use this function on a .calcTest call. I looked at raster:::.calcTest with no success. I have tried managing the case where all y values are NA, like so:

fun = function(y) { 
  if (any(!is.na(y))) { 
    lm(y ~ x)$coefficients
  } else { 
    NA
  }
}
r <- calc(s,fun)

And now It works for some minutes, but then I'm getting Error in setValues(out, x) : values must be numeric, integer, logical or factor. However it is common to set NA to raster values! I can't see what's wrong here.

Solution

This is how you can get the raster data

library(raster)
names = c('...','...','...','...','...')
s <- stack(names)
y <- values(s)

You could now do something like this.

x <- log(c(10,20,30,40,50))
# need to exclude the rows that are all NA
i <- rowSums(is.na(y)) < ncol(y)
coef <- apply(y[i, ], 1, function(y) lm(y ~ x)$coefficients)
aa <- matrix(NA, ncol=2, nrow=length(i))
aa[i, ] <- coef
b <- brick(s, nl=2)
values(b) <- aa

But you do not need to do that. To do regression like this, I would do

fun <- function(y) { lm(y ~ x)$coefficients }
r <- calc(s, fun)

But because you have cells with only NA values (across the layers) this will fail (like in the apply above). You need to write a function to catch these cases:

funa <- function(y) { 
    if(all(is.na(y))) {
        c(NA, NA)
    } else {
        lm(y ~ x)$coefficients 
    }
}
r <- calc(s, funa)

Or for a much faster approach

X <- cbind(1, y)
invXtX <- solve(t(X) %*% X) %*% t(X)
quickfun <- function(i) (invXtX %*% i)
m <- calc(s, quickfun) 
names(m) <- c('intercept', 'slope')

See ?raster::calc