Search code examples
rdatedata.tabledata-manipulationlubridate

R datatable create the average value over the five previous years


I have data with a variable for which I want to get the difference between the current level and the average over the same variable for the same month for the 5 previous years.

library(tidyverse)
library(data.table)
library(lubridate)
MWE <- as.data.table(ggplot2::economics) %>%
  .[,c("pce","psavert","uempmed","unemploy"):=NULL]

> MWE
           date      pop
  1: 1967-07-01 198712.0
  2: 1967-08-01 198911.0
  3: 1967-09-01 199113.0
  4: 1967-10-01 199311.0
  5: 1967-11-01 199498.0
 ---                    
570: 2014-12-01 319746.2
571: 2015-01-01 319928.6
572: 2015-02-01 320074.5
573: 2015-03-01 320230.8
574: 2015-04-01 320402.3

I can do it by month, but I have trouble incororating the reference to the current line to do something like year(date) < year(currentline) & year(date) >= year(currentline)-6

MWE_2 <- MWE[,MeanPastYears:=mean(pop),by=month(date)]

My desired output would be

       date      pop      avg_5yrs
  1: 1967-07-01 198712.0     NA
  2: 1967-08-01 198911.0     NA
  3: 1967-09-01 199113.0     NA
  4: 1967-10-01 199311.0     NA
  5: 1967-11-01 199498.0     NA
 ---                    
570: 2014-12-01 319746.2   313013.8
571: 2015-01-01 319928.6   313192.1
572: 2015-02-01 320074.5   313350.7
573: 2015-03-01 320230.8   313511.2
574: 2015-04-01 320402.3   313640.3

Solution

  • the columns inside [ can be indexed as vectors so we first create a vector for each row year(date) < year(date[..I]) & year(date) >= year(date[..I]) - 6 that has true when the date is in the interval, and then get the mean of pop by month:

    df[,
        year:=year(date)
    ][, 
        avg_5yrs := sapply(1:.N, function(..I) mean(pop[year < year[..I] & year >= year[..I] -6])), by=month(date)
    ][, year:=NULL][]
                date      pop avg_5yrs
      1: 1967-07-01 198712.0      NaN
      2: 1967-08-01 198911.0      NaN
      3: 1967-09-01 199113.0      NaN
      4: 1967-10-01 199311.0      NaN
      5: 1967-11-01 199498.0      NaN
     ---                             
    570: 2014-12-01 319746.2 311845.5
    571: 2015-01-01 319928.6 312028.1
    572: 2015-02-01 320074.5 312192.6
    573: 2015-03-01 320230.8 312357.4
    574: 2015-04-01 320402.3 312498.1