Search code examples
rdataframegroup-bytidyversecumsum

find yearly cumsum of a variable in R?


In my code below, i would like to find the cumsum for each year. Right now, Variable A is being summed for the entire duration. Any help would be appreciated.

library(dplyr)
library(lubridate)
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                       A = runif(1095, 0,10))
DF1 <- DF %>% 
  mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
  filter(between(Month,5,10)) %>%
  group_by(Year, JDay) %>% 
  mutate(Precipitation = cumsum(A))

Solution

  • It seems the issue here is with your grouping clause. Specifically, as there are as many distinct combinations of Year and JDay in your data as there are rows in DF, the subsequent cumsum operation inside mutate will simply return the same value as the input column, A. I believe the following should give you what you're after

    library(dplyr)
    library(lubridate)
    
    set.seed(50)
    DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                     A = runif(1095, 0,10))
    DF1 <- DF %>% 
      mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
      filter(between(Month,5,10)) %>%
      arrange(Year, JDay) %>% 
      group_by(Year) %>% 
      mutate(Precipitation =  cumsum(A)) %>% 
      ungroup()
    
    # illustrate that Precipitation does indeed give the cumulative value of A for
    # each year by printing the first 5 observations for each year in DF1
    DF1 %>% 
      group_by(Year) %>% 
      slice(1:5)
    #> # A tibble: 15 x 6
    #> # Groups:   Year [3]
    #>    date           A  Year Month  JDay Precipitation
    #>    <date>     <dbl> <dbl> <dbl> <dbl>         <dbl>
    #>  1 2001-05-01 6.25   2001     5   121          6.25
    #>  2 2001-05-02 0.188  2001     5   122          6.43
    #>  3 2001-05-03 5.37   2001     5   123         11.8 
    #>  4 2001-05-04 5.55   2001     5   124         17.4 
    #>  5 2001-05-05 5.15   2001     5   125         22.5 
    #>  6 2002-05-01 2.95   2002     5   121          2.95
    #>  7 2002-05-02 6.75   2002     5   122          9.71
    #>  8 2002-05-03 7.77   2002     5   123         17.5 
    #>  9 2002-05-04 8.13   2002     5   124         25.6 
    #> 10 2002-05-05 5.58   2002     5   125         31.2 
    #> 11 2003-05-01 9.98   2003     5   121          9.98
    #> 12 2003-05-02 8.24   2003     5   122         18.2 
    #> 13 2003-05-03 6.13   2003     5   123         24.4 
    #> 14 2003-05-04 5.22   2003     5   124         29.6 
    #> 15 2003-05-05 9.81   2003     5   125         39.4