Resample raster

I am trying to resample a forest cover raster with high resolution (25 meters) and categorical data (1 to 13) to a new RasterLayer with a lower resolution (~ 1 km). My idea is to combine the forest cover data with other lower-resolution raster data :

I tried raster::resample(), but since the data is categorical I lost a lot of information:
```
summary(as.factor(df$loss_year_mosaic_30m))
  0       1   2   3  4   5   6   7  8   9   10  11   12  13
3777691  65  101 50 151 145 159 295 291 134 102 126 104  91 
```
As you can see, the new raster has the desired resolution but have lots of zeros as well. I suppose that is normal since I used the ´ngb´ option in resample.
The second strategy was using raster::aggregate() but I find difficult to define a factor integer since the change of resolution is not straightforward (like the double of the resolution or alike).

My high-resolution raster has the following resolution, and I want it to aggregate it to a 0.008333333, 0.008333333 (x, y) resolution to the same extent.
```
loss_year
class       : RasterLayer 
dimensions  : 70503, 59566, 4199581698  (nrow, ncol, ncell)
resolution  : 0.00025, 0.00025  (x, y)
extent      : -81.73875, -66.84725, -4.2285, 13.39725  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Volumes/LaCie/Deforestacion/Hansen/loss_year_mosaic_30m.tif 
names       : loss_year_mosaic_30m 
values      : 0, 13  (min, max)
```
I have tried a factor of ~33.33 following the description of the aggregate help: "The number of cells is the number of cells of x divided by fact*fact (when fact is a single number)." Nonetheless, the resulting raster data do not seem to have the same number of rows and columns as my other low-resolution rasters.

I have never used this high-resolution data, and I am also computationally limited (some of this commands can be parallelized using clusterR, but sometimes they took the same time than the non-parallelized commands, especially since they do not work for nearest neighboor calculations).

I am short of ideas; maybe I can try layerize to obtain a count raster, but I have to ´aggregate´ and the factor problem arises. Since this processes are taking me days to process, I do want to know the most efficient way to create a lower resolution raster without losing much information

A reproducible example could be the following:

r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6) #Low resolution raster

First strategy: loss of information

r <- resample(r_hr, r_lr, method = "ngb") #The raster data is categorical

Second strategy: difficult to define an aggregate factor

r <- aggregate(r_hr, factor) #How to define a factor to get exactly the same number of cells of h_lr?

Another option: layerize

r_brick <- layerize(r_hr)
aggregate(r_brick, factor) #How to define factor to coincide with the r_lr dimensions?

Thanks for your help!

Solution

r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6)

r_hr
#class       : RasterLayer 
#dimensions  : 70, 70, 4900  (nrow, ncol, ncell)
#resolution  : 5.142857, 2.571429  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
#data source : in memory
#names       : layer 
#values      : 1, 5  (min, max)

r_lr
#class       : RasterLayer 
#dimensions  : 6, 6, 36  (nrow, ncol, ncell)
#resolution  : 60, 30  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

Direct aggregate is not possible, because 70/6 is not an integer.

dim(r_hr)[1:2] / dim(r_lr)[1:2]
#[1] 11.66667 11.66667

Nearest neighbor resampling is not a good idea either as the results would be arbitrary.

Here is a by layer approach that you suggested and dww also showed already.

b <- layerize(r_hr)
fact <- round(dim(r_hr)[1:2] / dim(r_lr)[1:2])
a <- aggregate(b, fact)
x <- resample(a, r_lr)

Now you have proportions. If you want a single class you could do

y <- which.max(x)

In that case, another approach would be to aggregate the classes

ag <- aggregate(r_hr, fact, modal) 
agx <- resample(ag, r_lr, method='ngb')

Note that agx and y are the same. But they can both be problematic as you might have cells with 5 classes with each about 20%, making it rather unreasonable to pick one winner.