Search code examples
rdplyrtidyverse

Create new rows based on condition of previous rows in r


I have a dataset as follows

structure(list(id = c(1, 2, 2, 2), enrollment = c(2014, 2011, 
2012, 2013), deregister = c(2016, 9999, 9999, 9999)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

I need to convert that dataset to the following one:

structure(list(id = c(1, 1, 1, 2, 2, 2), enrollment = c(2014, 
2015, 2016, 2011, 2012, 2013), deregister = c(9999, 9999, 2016, 
9999, 9999, 9999)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

The idea is: if deregister is not 9999, add a new row to dataset by adding 1 to the enrollment untill enrollment=degister. Encode newly added row's deregisters with 9999 until enrollment=degister.

Since I have a lot of observations, I want to create dataset without loops.

thanks.


Solution

  • You can use mapply + : to create the sequences:

    library(dplyr)
    library(tidyr)
    df %>% 
      mutate(enrollment = ifelse(deregister != 9999, mapply(`:`, enrollment, deregister), enrollment)) %>% 
      unnest_longer(enrollment) %>% 
      mutate(deregister = replace(deregister, enrollment != deregister, 9999))
    
    #   id enrollment deregister
    # 1  1       2014       9999
    # 2  1       2015       9999
    # 3  1       2016       2016
    # 4  2       2011       9999
    # 5  2       2012       9999
    # 6  2       2013       9999