Search code examples
rdplyrlag

Lag of every nth element


I have data frame as :

df <- data.frame( date =seq(from = as.Date("2000-01-01"), 
                              to =  as.Date("2005-01-01"),'month'))


df <-  df %>% mutate(cumsum = seq(1, length.out = length(date)))

I want to create a new column, which is the sum of the value in cumsum and every 12th value (one year back).

EDIT: I like both your answers! Actually I just found a problem for the solution for me (sorry my explanation was not quite clear.) Your approach gives me the sum of the value now and one year befor. But I do have seveal years and would need the cumsum of all overervation in previous years (so sum(x, lag(x,12), lag(x,24), lag (x,36)). I tried smth. like (rep(lag(cumsu, 12), nrow(df)/12). May you can help. Thanks!


Solution

  • The literal approach is to use lag, and if you are assured of perfectly-spaced data, then @Jamie's answer is the most direct and simplest approach.

    However, if there is a chance that you don't have all intermediate months, this could lag incorrectly. One way to guard against this is to self-join with the previous date.

    df2 <- df[-20,] # just to impose some missingness
    library(lubridate) # %m+%
    df2 %>%
      mutate(
        # this is the more direct route, but with missingness it glitches
        rolling_12 = cumsum + lag(cumsum, n = 12),
        lastyear = date %m+% years(-1)
      ) %>%
      left_join(df2, by = c("lastyear" = "date"), suffix = c("", "_12")) %>%
      mutate(cumsum_12 = cumsum + cumsum_12) %>%
      select(-lastyear)
    #          date cumsum rolling_12 cumsum_12
    # 1  2000-01-01      1         NA        NA
    # 2  2000-02-01      2         NA        NA
    # 3  2000-03-01      3         NA        NA
    # 4  2000-04-01      4         NA        NA
    # 5  2000-05-01      5         NA        NA
    # 6  2000-06-01      6         NA        NA
    # 7  2000-07-01      7         NA        NA
    # 8  2000-08-01      8         NA        NA
    # 9  2000-09-01      9         NA        NA
    # 10 2000-10-01     10         NA        NA
    # 11 2000-11-01     11         NA        NA
    # 12 2000-12-01     12         NA        NA
    # 13 2001-01-01     13         14        14
    # 14 2001-02-01     14         16        16
    # 15 2001-03-01     15         18        18
    # 16 2001-04-01     16         20        20
    # 17 2001-05-01     17         22        22
    # 18 2001-06-01     18         24        24
    # 19 2001-07-01     19         26        26
    # 20 2001-09-01     21         29        30  <-- this is where rolling_12 goes wrong
    # 21 2001-10-01     22         31        32
    # 22 2001-11-01     23         33        34
    # 23 2001-12-01     24         35        36
    # 24 2002-01-01     25         37        38
    # 25 2002-02-01     26         39        40
    # 26 2002-03-01     27         41        42
    # 27 2002-04-01     28         43        44
    # 28 2002-05-01     29         45        46
    # 29 2002-06-01     30         47        48
    # 30 2002-07-01     31         49        50
    # 31 2002-08-01     32         51        NA
    # 32 2002-09-01     33         54        54
    # 33 2002-10-01     34         56        56
    # 34 2002-11-01     35         58        58
    # 35 2002-12-01     36         60        60
    # 36 2003-01-01     37         62        62
    # 37 2003-02-01     38         64        64
    # 38 2003-03-01     39         66        66
    # 39 2003-04-01     40         68        68
    # 40 2003-05-01     41         70        70
    # 41 2003-06-01     42         72        72
    # 42 2003-07-01     43         74        74
    # 43 2003-08-01     44         76        76
    # 44 2003-09-01     45         78        78
    # 45 2003-10-01     46         80        80
    # 46 2003-11-01     47         82        82
    # 47 2003-12-01     48         84        84
    # 48 2004-01-01     49         86        86
    # 49 2004-02-01     50         88        88
    # 50 2004-03-01     51         90        90
    # 51 2004-04-01     52         92        92
    # 52 2004-05-01     53         94        94
    # 53 2004-06-01     54         96        96
    # 54 2004-07-01     55         98        98
    # 55 2004-08-01     56        100       100
    # 56 2004-09-01     57        102       102
    # 57 2004-10-01     58        104       104
    # 58 2004-11-01     59        106       106
    # 59 2004-12-01     60        108       108
    # 60 2005-01-01     61        110       110