I’m trying to confirm if there is a bug or at least unexplained change in behavior in the function approxNA in the most recent version (2.5-8) of the R Raster package. Later note: the correct answer is seen as a comment in the answer string below that advises a package update that has now been released
I have a small reproducible example below. The original symptoms were that most the output of the approxNA command on a year-long daily eMODIS brick appeared to be the same as the input – it had lots and lots of NA values that had not been interpolated… but when I switched back to an older version of the package (2.3-12) and an older version of R (3.3.1, Bug-in-your-hair) it returns the correct interpolated results. I have tried it with raster datatype set to INT2S and with no particular raster datatype set.
I checked the documentation in case there's some reason to expect the results to differ but there are no changes in the documentation between the old package version that worked correctly and the new package version that does not work correctly.
In my reproducible example there are fewer errors but they are obvious to see just from the plotting view (included in the example). In these plots you can see that the northern parts of several layers have NA values remaining that should have been interpolated.
I’m running 64-bit Windows 7 and 64-bit R 3.3.2 with raster 2.5-8 with an R-Studio interface, but I've tested also on R-console with the same results and on a different computer that has the same configuration, and also tested in 32-bit R with the same results.
A trigger seems to be the presence of NA values in the first or last layer, but the errors occur in cells that do not have NA values in the first or last layer: for example the top left cells are only missing in layer 2 in the original stack, so it should be a very simple interpolation for those cells from layer 1 to layer 3.
I did follow the R instructions for submitting a bug report a few months ago but I’m not sure if that contact info is correct, or if it might have gone into a spam folder.
I'd also be happy with answers suggesting a direction towards a workaround, so long as it is not too much slower and uses common packages or packages that are regularly maintained.
I can't seem to label them correctly but way below are
plot of the example input brick
plot of the good output brick
plot of the bad output brick
Thanks for any suggestions.
##### start Reproducible example
library(raster)
# make a large raster, it must be quite big to have a problem
r <- raster(ncols=8142, nrows=3285)
# make six rasters to stack, use different ranges to better see the interpolation results
r1 <- setValues(r, round(10 * runif(ncell(r)),0))
r2 <- setValues(r, NA)
r3 <- setValues(r, round(100 * runif(ncell(r)),0))
r4 <- setValues(r, round(1000 * runif(ncell(r)),0))
r5 <- setValues(r, round(50 * runif(ncell(r)),0))
r6 <- setValues(r, round(100 * runif(ncell(r)),0))
# insert some NA values
r1[100:600,] <- NA
r3[,1500:1950] <- NA
r5[,400:800] <- NA
r6[2500:2900,] <- NA
# insert some constant values to make it easier to see if interpolation is working
r4[300:500,] <- 750
r6[300:400,] <- 20
r6[400:500,] <- 100
# make a stack
s <- stack(r1,r2,r3,r4,r5,r6)
# see what the pre-interpolation stack looks like
#plot(s)
plot(s, col = colorRampPalette(c("royalblue","springgreen","yellow","red"))(255))
# interpolate, this takes a while
x2 <- approxNA(s, method = "linear", rule=2)
# see what the post interpolation looks like, there should be filled in values for all
# parts of all layers but instead it’s retaining many of the NA values
#plot(x2)
plot(x2, col = colorRampPalette(c("royalblue","springgreen","yellow","red"))(255))
## End reproducible example
I cannot reproduce the problem. I have simplified the example a bit
library(raster)
r <- raster(ncols=81, nrows=32)
r1 <- setValues(r, 1)
r2 <- setValues(r, NA)
r3 <- setValues(r, 3)
r4 <- setValues(r, 4)
r5 <- setValues(r, 5)
r6 <- setValues(r, 6)
# insert some NA values
r1[1:6,] <- NA
r3[,15:19] <- NA
r5[,4:8] <- NA
r6[25:29,] <- NA
# make a stack
s <- stack(r1,r2,r3,r4,r5,r6)
# see what the pre-interpolation stack looks like
spplot(s)
# in memory
x1 <- approxNA(s, method = "linear", rule=2)
spplot(x1)
# to disk (that is, simulate a large raster
rasterOptions(todisk=TRUE)
x2 <- approxNA(s, method = "linear", rule=2)
rasterOptions(todisk=FALSE)
spplot(x2)
I do not see a difference. x1
and x2
seem identical
x2 - x1
#class : RasterBrick
#dimensions : 32, 81, 2592, 6 (nrow, ncol, ncell, nlayers)
#resolution : 4.444444, 5.625 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#data source : in memory01_223851_3440_81523.grd
#names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6
#min values : 0, 0, 0, 0, 0, 0
#max values : 0, 0, 0, 0, 0, 0