Search code examples
rrandompolygonintersectionraster

R: Is it possible to sample random points from only the non-NA region of a raster layer?


I've been attempting to randomly sample a user-defined number of points from a raster layer (e.g. any of the 24 climate variable layers, in Lambert Conformal Conic projection, from https://sites.ualberta.ca/~ahamann/data/climatewna.html) using spsample, within a region whose extent is defined by a polygon I generated by forming circles around a set of points and aggregating their boundaries, but cannot figure out how to only sample those portions of the raster within that region which are defined. The image below shows the raster layer and polygon I am working with:enter image description here

I understand that it is possible to sample within the region and then use the raster layer as a mask to remove from the sampled points those which fall within the NA region of the layer, but, in doing this, the number of background points left over is less than that which was user-specified within the spsample function. This is the code I've written to complete this operation:

circles <- circles(coordinate_dat_train, 150000)
circle_polygon <- polygons(circles)
proj4string(circle_polygon) <- CRS("+proj=lcc +lat_1=49 +lat_2=77 +lat_0=0 +lon_0=-95 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")

sample_mask <- raster("C:/...path.../cv_NORM_6190_AHM.tif")
r_sample <- spsample(circle_polygon, 1000, type='random')
r_sample_cells <- as.data.frame(extract(sample_mask, r_sample, cellnumbers = TRUE))
r_sample_cells <- as.data.frame(r_sample_cells[!is.na(r_sample_cells$cv_NORM_6190_AHM),])
r_sample_cells <- r_sample_cells[,-2]
r_sample_coords <- xyFromCell(sample_mask, r_sample_cells, spatial = TRUE)

I've tried also converting the raster into a polygon using the rasterToPolygon function, finding the intersection of this polygon and the original, and sampling within only this intersection, but with the raster layers that I am working with being so large, the time required to complete this process is not feasible.

Are there other ways of completing this operation, which are reasonably fast in terms of computational time? Thanks in advance!


Solution

  • One way of selecting such a set of points from a raster bounded by a polygon is to first mask the raster with the polygon, and then generate a new raster by selecting from the masked one only those cells containing non-NA values. This can be done, in this case, like so:

    sample_mask <- raster("C:/...path.../cv_NORM_6190_AHM.tif")
    # This command masks the raster so that its extent equals that of the circle polygon.
    mask <- mask(sample_raster, circle_polygon)
    # This command creates a new raster layer consisting of non-NA values from the masked layer.
    non_na_raster <- !is.na(mask)
    

    Then, using the sampleRandom function (which does not select more than one point from a cell, unless the number of points to be selected exceeds the number of cells within the raster layer in question), one can define explicitly the number of points they wish to be selected from the terrestrial region of the masked raster, like so:

    # This command counts the number of cells within the non-NA raster layer, giving an accurate measure of the
    # terrestrial area from which background points are drawn, so that sampling density may be determined.
    cellStats(non_na_raster, 'sum') 
    r_sample <- as.data.frame(sampleRandom(x=non_na_raster, size = 1000, na.rm = TRUE, ext = circle_polygon, xy = TRUE))
    

    It would be possible to use some other random point selection function, like spsample, but sampleRandom is one which is specific to the raster package and avoids multiple-sampling of the same raster cell, which is why I used it in this case.