Search code examples
rdplyrcumsum

Cumulative sum for each row of data for the same ID


I have this data frame:

df=data.frame(id=c(1,1,2,2,2,5,NA),var=c("a","a","b","b","b","e","f"),value=c(1,1,0,1,0,0,1),cs=c(2,2,3,3,3,3,NA))

I want to calculate the sum of value for each group (id, var) and then the cumulative sum but I would like to have the cumulative sum to be displayed for each row of data, i.e., I don't want to summarized view of the data. I have included what my output should look like. This is what I have tried so far:

df%>%arrange(id,var)%>%group_by(id,var)%>%mutate(cs=cumsum(value)) 

Any suggestions?


Solution

  • Here is an approach that I think meets your expectations.

    Would group by id and calculate the sum of value for each id via summarise.

    You can then add your cumulative sum column with mutate. Based on your comments, I included an ifelse so that if id was NA, it would not provide a cumulative sum, but instead be given NA.

    Finally, to combine your cumulative sum data with your original dataset, you would need to join the two tables.

    library(tidyverse)
    
    df %>%
      arrange(id) %>%
      group_by(id) %>%
      summarise(sum = sum(value)) %>%
      mutate(cs=ifelse(is.na(id), NA, cumsum(sum))) %>%
      left_join(df)
    

    Output

    # A tibble: 7 x 5
         id   sum    cs var   value
      <dbl> <dbl> <dbl> <fct> <dbl>
    1     1     2     2 a         1
    2     1     2     2 a         1
    3     2     1     3 b         0
    4     2     1     3 b         1
    5     2     1     3 b         0
    6     5     0     3 e         0
    7    NA     1    NA f         1