Search code examples
rregressionlinear-regressionrasterlm

Linear regression on raster images - lm complains about NAs


I'm sure this can be fixed with few bytes, but I've spent hours on this simple thing and can't get out of it. I don't use R often.

I have 5 asciigrid files that represent 5 raster images. Some pixels do have values, other do have NAs. For example, the first image might be something like:

NA  NA  NA  NA  NA
NA  NA  2   3   NA
NA  0.2 0.3 1   NA
NA  NA  4   NA  NA

and the second might be:

NA  NA  NA  NA  NA
NA  NA  5   1   NA
NA  0.1 12  12  NA
NA  NA  6   NA  NA

As you can see, NA position is always the same and I'm 100% sure about that. What I'm willing to do:

  • read the files with read.asciigrid();
  • get their values in a long array, using values() from the raster package;
  • create a matrix with 5 rows, each of them holding the values of the corresponding map;
  • linear fit each column and get the coefficients. Each column will represent a pixel, and will have 5 values corresponding to the 5 maps.
  • create two new raster images with the coefficient values.

I'm stuck at lm. Specifically, it says: Error in lm.fit(...): 0 (non-NA) cases. However, from what I know about the imput maps, there should be either columns with all NAs or columns with no NA at all, like as follows:

NA   NA   NA   NA   0.2  2    NA  ... (lots of other columns)
NA   NA   NA   NA   2    2.1  NA
NA   NA   NA   NA   3    0.5  NA
NA   NA   NA   NA   12   6    NA
NA   NA   NA   NA   0.4  2    NA

I'd expect the output to be:

NA   NA   NA   NA   ..   ..   NA

so I can create a new raster image with the coefficients and keep the NA position. Where am I wrong? Pasting my code below. Thank you.

library(sp)
library(raster)
library(fields)
names = c('...','...','...','...','...')
x = c(10,20,30,40,50)
x = log(x)
y = vector('list',length=length(x))
rasters = vector('list',length=length(x))
for (name in names) {
  ind = which(name == names)
  rasters[ind] = read.asciigrid(name)
  rasters[ind] = raster(rasters[[ind]])
  y[[ind]] = values(rasters[[ind]])
}

y = t(simplify2array(y))
lModel = lm(y ~ x) // Error here!

This is the output of str(y):

num [1:5, 1:1260630] NA NA NA NA NA NA NA NA NA NA ... (at some point there will be numbers here)

Edit

Thanks to @RobertH I learned about raster::stack and raster::calc. I have tried:

x <- log(c(10,20,30,40,50))
fun <- function(y) { lm(y ~ x)$coefficients }
r <- calc(s, fun)

Getting an obscure Cannot use this function on a .calcTest call. I looked at raster:::.calcTest with no success. I have tried managing the case where all y values are NA, like so:

fun = function(y) { 
  if (any(!is.na(y))) { 
    lm(y ~ x)$coefficients
  } else { 
    NA
  }
}
r <- calc(s,fun)

And now It works for some minutes, but then I'm getting Error in setValues(out, x) : values must be numeric, integer, logical or factor. However it is common to set NA to raster values! I can't see what's wrong here.


Solution

  • This is how you can get the raster data

    library(raster)
    names = c('...','...','...','...','...')
    s <- stack(names)
    y <- values(s)
    

    You could now do something like this.

    x <- log(c(10,20,30,40,50))
    # need to exclude the rows that are all NA
    i <- rowSums(is.na(y)) < ncol(y)
    coef <- apply(y[i, ], 1, function(y) lm(y ~ x)$coefficients)
    aa <- matrix(NA, ncol=2, nrow=length(i))
    aa[i, ] <- coef
    b <- brick(s, nl=2)
    values(b) <- aa
    

    But you do not need to do that. To do regression like this, I would do

    fun <- function(y) { lm(y ~ x)$coefficients }
    r <- calc(s, fun)
    

    But because you have cells with only NA values (across the layers) this will fail (like in the apply above). You need to write a function to catch these cases:

    funa <- function(y) { 
        if(all(is.na(y))) {
            c(NA, NA)
        } else {
            lm(y ~ x)$coefficients 
        }
    }
    r <- calc(s, funa)
    

    Or for a much faster approach

    X <- cbind(1, y)
    invXtX <- solve(t(X) %*% X) %*% t(X)
    quickfun <- function(i) (invXtX %*% i)
    m <- calc(s, quickfun) 
    names(m) <- c('intercept', 'slope')
    

    See ?raster::calc