Search code examples
rdplyrgroup-by

R incrementing a variable in dplyr


I have the following grouped data frame:

library(dplyr)

# Create a sample dataframe
df <- data.frame(
  student = c("A", "A", "A","B","B", "B", "C", "C","C"),
  grade = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
  age= c(NA, 6, 6, 7, 7, 7, NA, NA, 9)
)

I want to update the age of each student so that it is one plus the age in the previous year, with their age in the first year they appear in the dataset remaining unchanged. For example, student A's age should be NA, 6, 7, student B's age should be 7,8,9, and student C's age should be NA, NA, 9.


Solution

  • How about this:

    library(dplyr)
    df <- data.frame(
      student = c("A", "A", "A","B","B", "B", "C", "C","C"),
      grade = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
      age= c(NA, 6, 6, 7, 7, 7, NA, NA, 9)
    )
    df %>% 
      group_by(student) %>% 
      mutate(age = age + cumsum(!is.na(age))-1)
    #> # A tibble: 9 × 3
    #> # Groups:   student [3]
    #>   student grade   age
    #>   <chr>   <dbl> <dbl>
    #> 1 A           1    NA
    #> 2 A           2     6
    #> 3 A           3     7
    #> 4 B           1     7
    #> 5 B           2     8
    #> 6 B           3     9
    #> 7 C           1    NA
    #> 8 C           2    NA
    #> 9 C           3     9
    

    Created on 2022-12-30 by the reprex package (v2.0.1)