Search code examples
rcumsum

How to replace some values with max values from cumsum within groups


I am trying to calculate a cumulative sum for students that have been exposed to a support network during their schooling.

A sample dataframe would be (ID=student, Term=semester of interest, Support=Exposure to support network):

df <- data.frame(ID=c(123451, 123451, 123451, 123451, 123452, 123452, 123452, 123452, 
                      123452, 123452, 123452, 123453, 123453, 123453, 123453, 123453, 123453, 123453, 123453),
                 Term= c(1141, 1148, 1158, 1141, 1158, 1161, 1148, 1151, 1158, 1138, 
                         1141, 1138, 1141, 1141, 1148, 1138, 1148, 1158, 1161), 
                 Support = c(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1))

Since I am looking for cumulative exposure (starting from their earliest semester) I first ordered the data by ID and by Term: df <- df[order(df[,1], df[,2]),]

I then calculated a cumulative sum for the Support variable into a separate column

df$Dosage <- ave(df[3], df[1], FUN=cumsum)

Output:

       ID Term Support Dosage
1  123451 1141       1       1
4  123451 1141       1       2
2  123451 1148       0       2
3  123451 1158       1       3
10 123452 1138       0       0
11 123452 1141       0       0
7  123452 1148       1       1
8  123452 1151       1       2
5  123452 1158       1       3
9  123452 1158       1       4
6  123452 1161       0       4
12 123453 1138       1       1
16 123453 1138       0       1
13 123453 1141       0       1
14 123453 1141       1       2
15 123453 1148       0       2
17 123453 1148       1       3
18 123453 1158       1       4
19 123453 1161       1       5

While this is useful, in the event that a student has two rows for the same semester, I would like the value for Dosage to reflect the maximum value.

So for Student=123451 and Term=1141 I would like both Dosage values to equal 2.

For Student=123452 and Term=1158, I would like both Dosage values to equal 4.

For Student=123453 and Term=1148, I would like both Dosage values to equal 3.

Expected Output:

       ID Term Support Dosage
1  123451 1141       1       2
4  123451 1141       1       2
2  123451 1148       0       2
3  123451 1158       1       3
10 123452 1138       0       0
11 123452 1141       0       0
7  123452 1148       1       1
8  123452 1151       1       2
5  123452 1158       1       4
9  123452 1158       1       4
6  123452 1161       0       4
12 123453 1138       1       1
16 123453 1138       0       1
13 123453 1141       0       2
14 123453 1141       1       2
15 123453 1148       0       3
17 123453 1148       1       3
18 123453 1158       1       4
19 123453 1161       1       5

Solution

  • Not quite understand what your problem is, but maybe you can try this:

    
    library(dplyr)
    
    
    df <- data.frame(ID=c(123451, 123451, 123451, 123451, 123452, 123452, 123452, 123452, 
                          123452, 123452, 123452, 123453, 123453, 123453, 123453, 123453, 123453, 123453, 123453),
                     Term= c(1141, 1148, 1158, 1141, 1158, 1161, 1148, 1151, 1158, 1138, 
                             1141, 1138, 1141, 1141, 1148, 1138, 1148, 1158, 1161), 
                     Support = c(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1))
    
    
    df %>% 
      arrange(ID, Term) %>%
      group_by(ID) %>%
      mutate(Dosage = cumsum(Support)) %>%
      ungroup() %>%
      group_by(ID, Term) %>%
      mutate(Dosage = max(Dosage)) %>%
      ungroup()
    
           ID  Term Support Dosage
     1 123451  1141       1      2
     2 123451  1141       1      2
     3 123451  1148       0      2
     4 123451  1158       1      3
     5 123452  1138       0      0
     6 123452  1141       0      0
     7 123452  1148       1      1
     8 123452  1151       1      2
     9 123452  1158       1      4
    10 123452  1158       1      4
    11 123452  1161       0      4
    12 123453  1138       1      1
    13 123453  1138       0      1
    14 123453  1141       0      2
    15 123453  1141       1      2
    16 123453  1148       0      3
    17 123453  1148       1      3
    18 123453  1158       1      4
    19 123453  1161       1      5