Search code examples
rrasterpearson

R: Calculating Pearson correlation coefficient in each cell along time line


I have two sets of rasters, both with same x,y,z extent. I've made two stacks: stacka and stackb. I want to calculate the Pearson correlation coefficient (PCC) in each grid cell between two stacks along the time line. I've made a simpler example (forgive me with the dumb way of creating rasters)

a1<-c(1,1,1,1,1,1,1,1,NA)
a2<-c(2,2,2,2,1,2,2,NA,2)
a3<-c(3,3,3,3,3,2,NA,3,3)
b1<-c(2,2,2,2,2,2,2,2,2)
b2<-c(3,3,3,3,3,3,3,3,3)
b3<-c(4,4,4,4,4,4,4,4,4)
matrixa1<-matrix(a1,3,3)
matrixa2<-matrix(a2,3,3)
matrixa3<-matrix(a3,3,3)
matrixb1<-matrix(b1,3,3)
matrixb2<-matrix(b2,3,3)
matrixb3<-matrix(b3,3,3)
rastera1<-raster(matrixa1)
rastera2<-raster(matrixa2)
rastera3<-raster(matrixa3)
rasterb1<-raster(matrixb1)
rasterb2<-raster(matrixb2)
rasterb3<-raster(matrixb3)
stacka<-stack(rastera1,rastera2,rastera3)
stackb<-stack(rasterb1,rasterb2,rasterb3)

a_bar<-calc(stacka,mean,na.rm=TRUE)
b_bar<-calc(stackb,mean,na.rm=TRUE)
numerator<-setValues(rastera1,0)
denominator1<-numerator
denominator2<-numerator
for(i in 1:noflayers){
  numerator<-numerator+(stacka[[i]]-a_bar)*(stackb[[i]]-b_bar)
  denominator1<-denominator1+(stacka[[i]]-a_bar)^2
  denominator2<-denominator2+(stackb[[i]]-b_bar)^2
}
pearsoncoeff<-numerator/sqrt(denominator1*denominator2)

In the end I have a raster with each grid cell filled with PCC. The problem is, data a is intermittent, some grids are NA in some layers. So the end product has some blanks. My algorithm spits out "NA" when it encounters NA. I'd need some option like na.rm=TRUE in the calculation, so the output would calculate whatever months have values.

The method I can think of is to use is.na(stacka[[nlayers]][nrows,ncols]==FALSE and find corresponding pair in stackb, but that's on cell basis,which'd take enormous amount of computer time.


Solution

  • I edited Paulo's recommended approach to deal with NAs in the computation and it seems to work fast on a bunch of tests, including the dataset above:

    stack.correlation <- function(stack1, stack2, cor.method){
      # output template
      cor.map <- raster(stack1)
      # combine stacks
      T12 <- stack(stack1,stack2)
      rnlayers=nlayers(T12)
      # the function takes a vector, partitions it in half, then correlates
      # the two sections, returning the correlation coefficient. 
      stack.sequence.cor <- function(myvec,na.rm=T){
        myvecT1<-myvec[1:(length(myvec)/2)]
        myvecT2<-myvec[(length(myvec)/2+1):length(myvec)]
        return(cor(myvecT1,myvecT2, method =  cor.method, use="complete.obs"))
      }
      # apply the function above to each cell and write the correlation
      # coefficient to the output template. 
      cor.map <- stackApply(T12, indices = rep(1, rnlayers), 
                            fun = stack.sequence.cor, na.rm = FALSE)
    
      return(cor.map)
    }
    cor_r=stack.correlation(stacka, stackb, "pearson")