Search code examples
rrasternasampleterra

spatSample() choosing NA cells even when na.rm = TRUE


I assume this is the cause of my error. I have a stack of 18 rasters which underwent the following preprocessing steps:

  • Cropping to a specific region using a shapefile
  • Trim to the extent of the region
  • Masking to remove unwanted values within the region

I'll provide a MRE below, but the original rasters look like this (all have the same shape, extent, CRS and position of NA cells):

enter image description here

In order to perform PCA with prcomp(), I first reduced the dataset to 300 cells using spatsample():

sampled_stack <- spatSample(
  stack,
  size = 300,                      # Reduce stack to 200 samples
  method = "random",               # Select samples randomly
  as.raster = TRUE,                # Get new SpatRaster w/ same extent but fewer cells
  na.rm = TRUE                       # Ignore NA cells when sampling
)

I need as.raster needs to be TRUE. The procedure works, but plotting sampled_stack already shows that na.rm = TRUE didn't work as there are NA cells. For prcomp():

pca_stack <- prcomp(
  sampled_stack,
  center = TRUE,                   # Shift data to zero centering
  scale = TRUE                     # Scale data to have unit variance
)

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

The error is self explanatory and dozens of questions here address it. I am relatively sure my case refers to the NA cells in my sampled stack. However, adding na.rm = TRUE as was the solution to this post didn't work.

How else can I create a randomly reduced sample that ignores my NA cells to then perform prcomp()?

I am using terra 1.7-65 (cannot update package due to constraints in my workplace)

Minimal reproducible example w/ randomly generated raster, then random addition of sufficient NA cells:

set.seed(42)
r <- rast(nrows = 100, ncols = 100)
values <- sample(c(NA, runif(45)), size = ncell(r), replace = TRUE)
values[sample(ncell(r), size = 2500)] <- NA
values(r) <- values
stacked_rasters <- c(r, r * 2, r/ 2)

Solution

  • This is not unexpected. The documentation states

    na.rm. logical. If TRUE, NAs are removed. Only used with random sampling of cell values. That is with method="random", as.raster=FALSE, cells=FALSE

    You do not explain why you need as.raster as you can do

    pca_stack <- prcomp(spatSample(stacked_rasters, 100, na.rm=TRUE))