Search code examples
rgisresolutionraster

Resample raster


I am trying to resample a forest cover raster with high resolution (25 meters) and categorical data (1 to 13) to a new RasterLayer with a lower resolution (~ 1 km). My idea is to combine the forest cover data with other lower-resolution raster data :

  1. I tried raster::resample(), but since the data is categorical I lost a lot of information:

    summary(as.factor(df$loss_year_mosaic_30m))
      0       1   2   3  4   5   6   7  8   9   10  11   12  13
    3777691  65  101 50 151 145 159 295 291 134 102 126 104  91 
    

    As you can see, the new raster has the desired resolution but have lots of zeros as well. I suppose that is normal since I used the ´ngb´ option in resample.

  2. The second strategy was using raster::aggregate() but I find difficult to define a factor integer since the change of resolution is not straightforward (like the double of the resolution or alike).

    My high-resolution raster has the following resolution, and I want it to aggregate it to a 0.008333333, 0.008333333 (x, y) resolution to the same extent.

    loss_year
    class       : RasterLayer 
    dimensions  : 70503, 59566, 4199581698  (nrow, ncol, ncell)
    resolution  : 0.00025, 0.00025  (x, y)
    extent      : -81.73875, -66.84725, -4.2285, 13.39725  (xmin, xmax, ymin, ymax)
    coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
    data source : /Volumes/LaCie/Deforestacion/Hansen/loss_year_mosaic_30m.tif 
    names       : loss_year_mosaic_30m 
    values      : 0, 13  (min, max)
    

    I have tried a factor of ~33.33 following the description of the aggregate help: "The number of cells is the number of cells of x divided by fact*fact (when fact is a single number)." Nonetheless, the resulting raster data do not seem to have the same number of rows and columns as my other low-resolution rasters.

I have never used this high-resolution data, and I am also computationally limited (some of this commands can be parallelized using clusterR, but sometimes they took the same time than the non-parallelized commands, especially since they do not work for nearest neighboor calculations).

I am short of ideas; maybe I can try layerize to obtain a count raster, but I have to ´aggregate´ and the factor problem arises. Since this processes are taking me days to process, I do want to know the most efficient way to create a lower resolution raster without losing much information

A reproducible example could be the following:

r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6) #Low resolution raster

First strategy: loss of information

r <- resample(r_hr, r_lr, method = "ngb") #The raster data is categorical

Second strategy: difficult to define an aggregate factor

r <- aggregate(r_hr, factor) #How to define a factor to get exactly the same number of cells of h_lr?

Another option: layerize

r_brick <- layerize(r_hr)
aggregate(r_brick, factor) #How to define factor to coincide with the r_lr dimensions? 

Thanks for your help!


Solution

  • r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
    set.seed(0)
    r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
    r_lr <- raster(nrow=6, ncol=6)
    
    r_hr
    #class       : RasterLayer 
    #dimensions  : 70, 70, 4900  (nrow, ncol, ncell)
    #resolution  : 5.142857, 2.571429  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
    #data source : in memory
    #names       : layer 
    #values      : 1, 5  (min, max)
    
    r_lr
    #class       : RasterLayer 
    #dimensions  : 6, 6, 36  (nrow, ncol, ncell)
    #resolution  : 60, 30  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
    

    Direct aggregate is not possible, because 70/6 is not an integer.

    dim(r_hr)[1:2] / dim(r_lr)[1:2]
    #[1] 11.66667 11.66667
    

    Nearest neighbor resampling is not a good idea either as the results would be arbitrary.

    Here is a by layer approach that you suggested and dww also showed already.

    b <- layerize(r_hr)
    fact <- round(dim(r_hr)[1:2] / dim(r_lr)[1:2])
    a <- aggregate(b, fact)
    x <- resample(a, r_lr)
    

    Now you have proportions. If you want a single class you could do

    y <- which.max(x)
    

    In that case, another approach would be to aggregate the classes

    ag <- aggregate(r_hr, fact, modal) 
    agx <- resample(ag, r_lr, method='ngb')
    

    Note that agx and y are the same. But they can both be problematic as you might have cells with 5 classes with each about 20%, making it rather unreasonable to pick one winner.