Search code examples
rinterpolationrasterr-raster

ApproxNA producing different and incorrect results in stacks of large rasters


I’m trying to confirm if there is a bug or at least unexplained change in behavior in the function approxNA in the most recent version (2.5-8) of the R Raster package. Later note: the correct answer is seen as a comment in the answer string below that advises a package update that has now been released

I have a small reproducible example below. The original symptoms were that most the output of the approxNA command on a year-long daily eMODIS brick appeared to be the same as the input – it had lots and lots of NA values that had not been interpolated… but when I switched back to an older version of the package (2.3-12) and an older version of R (3.3.1, Bug-in-your-hair) it returns the correct interpolated results. I have tried it with raster datatype set to INT2S and with no particular raster datatype set.

I checked the documentation in case there's some reason to expect the results to differ but there are no changes in the documentation between the old package version that worked correctly and the new package version that does not work correctly.

In my reproducible example there are fewer errors but they are obvious to see just from the plotting view (included in the example). In these plots you can see that the northern parts of several layers have NA values remaining that should have been interpolated.

I’m running 64-bit Windows 7 and 64-bit R 3.3.2 with raster 2.5-8 with an R-Studio interface, but I've tested also on R-console with the same results and on a different computer that has the same configuration, and also tested in 32-bit R with the same results.

A trigger seems to be the presence of NA values in the first or last layer, but the errors occur in cells that do not have NA values in the first or last layer: for example the top left cells are only missing in layer 2 in the original stack, so it should be a very simple interpolation for those cells from layer 1 to layer 3.

I did follow the R instructions for submitting a bug report a few months ago but I’m not sure if that contact info is correct, or if it might have gone into a spam folder.

I'd also be happy with answers suggesting a direction towards a workaround, so long as it is not too much slower and uses common packages or packages that are regularly maintained.

I can't seem to label them correctly but way below are

plot of the example input brick

plot of the good output brick

plot of the bad output brick

Thanks for any suggestions.

##### start  Reproducible example

library(raster)
# make a large raster, it must be quite big to have a problem
r <- raster(ncols=8142, nrows=3285)
# make six rasters to stack, use different ranges to better see the interpolation results
r1 <- setValues(r, round(10 * runif(ncell(r)),0))
r2 <- setValues(r, NA)
r3 <- setValues(r, round(100 * runif(ncell(r)),0))
r4 <- setValues(r, round(1000 * runif(ncell(r)),0))
r5 <- setValues(r, round(50 * runif(ncell(r)),0))
r6 <- setValues(r, round(100 * runif(ncell(r)),0))

# insert some NA values
r1[100:600,] <- NA
r3[,1500:1950] <- NA
r5[,400:800] <- NA
r6[2500:2900,] <- NA

# insert some constant values to make it easier to see if interpolation is working
r4[300:500,] <- 750
r6[300:400,] <- 20
r6[400:500,] <- 100
# make a stack
s <- stack(r1,r2,r3,r4,r5,r6)

# see what the pre-interpolation stack looks like
#plot(s)
plot(s, col = colorRampPalette(c("royalblue","springgreen","yellow","red"))(255))

# interpolate, this takes a while
x2 <- approxNA(s, method = "linear", rule=2)

# see what the post interpolation looks like, there should be filled in values for all
# parts of all layers but instead it’s retaining many of the NA values
#plot(x2)
plot(x2, col = colorRampPalette(c("royalblue","springgreen","yellow","red"))(255))


## End reproducible example

Plot of the input to the approxNa command Plot of the good output to the old approxNa command - sorry for the small labels. Plot of the bad output to the new approxNa command


Solution

  • I cannot reproduce the problem. I have simplified the example a bit

    library(raster)
    r <- raster(ncols=81, nrows=32)
    r1 <- setValues(r, 1)
    r2 <- setValues(r, NA)
    r3 <- setValues(r, 3)
    r4 <- setValues(r, 4)
    r5 <- setValues(r, 5)
    r6 <- setValues(r, 6)
    
    # insert some NA values
    r1[1:6,] <- NA
    r3[,15:19] <- NA
    r5[,4:8] <- NA
    r6[25:29,] <- NA
    
    # make a stack
    s <- stack(r1,r2,r3,r4,r5,r6)
    
    # see what the pre-interpolation stack looks like
    spplot(s)
    
    # in memory
    x1 <- approxNA(s, method = "linear", rule=2)
    spplot(x1)
    
    # to disk (that is, simulate a large raster
    rasterOptions(todisk=TRUE)
    x2 <- approxNA(s, method = "linear", rule=2)
    rasterOptions(todisk=FALSE)
    spplot(x2)
    

    I do not see a difference. x1 and x2 seem identical

    x2 - x1
    #class       : RasterBrick 
    #dimensions  : 32, 81, 2592, 6  (nrow, ncol, ncell, nlayers)
    #resolution  : 4.444444, 5.625  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
    #data source : in memory01_223851_3440_81523.grd 
    #names       : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6 
    #min values  :       0,       0,       0,       0,       0,       0 
    #max values  :       0,       0,       0,       0,       0,       0