Search code examples
rtrimmoving-average

Aggregating by fixed date range R


Given a simplification of my dataset like:

df <- data.frame("ID"= c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2),
                  "ForestType" = c("oak","oak","oak","oak","oak","oak","oak","oak","oak","oak","oak","oak",
                                   "pine","pine","pine","pine","pine","pine","pine","pine","pine","pine","pine","pine"),
                  "Date"= c("1987.01.01","1987.06.01","1987.10.01","1987.11.01",
                     "1988.01.01","1988.03.01","1988.04.01","1988.06.01",
                     "1989.03.01","1989.05.01","1989.07.01","1989.08.01",
                     "1987.01.01","1987.06.01","1987.10.01","1987.11.01",
                     "1988.01.01","1988.03.01","1988.04.01","1988.06.01",
                     "1989.03.01","1989.05.01","1989.07.01","1989.08.01"),
                    "NDVI"= c(0.1,0.2,0.3,0.55,0.31,0.26,0.34,0.52,0.41,0.45,0.50,0.7,
                     0.2,0.3,0.4,0.53,0.52,0.54,0.78,0.73,0.72,0.71,0.76,0.9),
                      check.names = FALSE, stringsAsFactors = FALSE) 

I would like to obtain the means of NDVI values by a certain period of time, in this case by year. Take into account that in my real dataset I would need it for seasons, so it should be adaptable.

These means should consider:

  • Trimming outliers: for example 25% of the highest values and 25% of the lowest values.
  • They should be by class, in this case by the ID field.

So the output should look something like:

> desired_df
  ID ForestType Date meanNDVI
1  1        oak 1987    0.250
2  1        oak 1988    0.325
3  1        oak 1989    0.430
4  2       pine 1987    0.350
5  2       pine 1988    0.635
6  2       pine 1989    0.740

In this case, for example, 0.250 corresponds to mean NDVI on 1987 of ID=1 and it is the mean of the 4 values of that year taking the lowest and the highest out.

Thanks a lot!


Solution

  • library(tidyverse)
    library(lubridate)
    
    df %>%
      mutate(Date = as.Date(Date, format = "%Y.%m.%d")) %>%
      group_by(ID, ForestType, Year = year(Date)) %>%
      filter(NDVI > quantile(NDVI, .25) & NDVI < quantile(NDVI, .75)) %>%
      summarise(meanNDVI = mean(NDVI))
    

    Output

    # A tibble: 6 x 4
    # Groups:   ID, ForestType [2]
         ID ForestType  Year meanNDVI
      <dbl> <chr>      <dbl>    <dbl>
    1     1 oak         1987    0.25 
    2     1 oak         1988    0.325
    3     1 oak         1989    0.475
    4     2 pine        1987    0.35 
    5     2 pine        1988    0.635
    6     2 pine        1989    0.74