I took a stratified random sample out of a raster
layer using R's raster
package and the sampleStratified
function:
library(raster)
r<-raster(nrows=5, ncols=5)
r[]<-c(1,0,0,0,1,1,0,1,0,0,1,0,1,0,1,1,1,0,1,0,0,0,1,1,1)
#Stratified random sample size
sampleStratified(r, size=5)
cell layer
[1,] 3 0
[2,] 22 0
[3,] 7 0
[4,] 21 0
[5,] 12 0
[6,] 13 1
[7,] 17 1
[8,] 11 1
[9,] 8 1
[10,] 23 1
What I would like to do now is to order the sample by the first column, interpolate the first column to get the original length of the raster and fill the missing values of the second column with NA to look like this:
[,1] [,2]
[1,] 1 NA
[2,] 2 NA
[3,] 3 0
[4,] 4 NA
[5,] 5 NA
[6,] 6 NA
[7,] 7 0
[8,] 8 1
[9,] 9 NA
[10,] 10 NA
[11,] 11 1
[12,] 12 0
[13,] 13 1
[14,] 14 NA
[15,] 15 NA
[16,] 16 NA
[17,] 17 1
[18,] 18 NA
[19,] 19 NA
[20,] 20 NA
[21,] 21 0
[22,] 22 0
[23,] 23 1
[24,] 24 NA
[25,] 25 NA
I tried something with the approxTime
function from the simecol
package but failed with the NA filling. I have 10 raster layers with around 500,000 values each so a fast approach would really appreciated.
I'd think about it the opposite way. Instead of interpolation which could be expensive, you already know the cells you want to change are those that are not in the random sample. so use your random sample as an index vector for the cell numbers you don't want to change and just use the [<-
replacement method on those cell indices that do not appear in your stratified sample. We use raster
methods for the base functions [<-
and %in%
and also seq_len
. Forgive the slightly long-winded example, better to show the steps. Should be quite fast and I don't envisage any problems with rasters of 500,000 cells...
# For reproducible example
set.seed(1)
# Get stratified random sample
ids <- sampleStratified(r, size=5)
# Copy of original raster (to visualise difference)
r2 <- r
# Get set of cell indices
cell_no <- seq_len(ncell(r2))
# Those indices to replace are those not in the random sample
repl <- cell_no[ ! cell_no %in% ids[,1] ]
# Replace cells not in sample with NA
r2[ repl ] <- NA
# Plot to show what is going on
par( mfrow = c(1,2))
plot(r)
plot(r2)