Search code examples
rdplyrdata-cleaning

Copy a row if there is only one record for that group


I have some pre-post data (in R) for which some individuals only have a value for time 1, like episode "2" here:

episode <- c('1','1','2','3','3')
score <- c('44','12','37','40','9')

df <- data.frame(episode,score)

For all the records where there is only "pre" data (1 score per episode), I would like to use R (dplyr preferred) to copy that record and then indicate for all records which is the pre and which is the post. So the final should look something like:

episode score time
1 44 1
1 12 2
2 37 1
2 37 2
3 40 1
3 9 2

Thanks!


Solution

  • Here is one option - create a column of frequency count by 'episode' and if the value is 1, then add 1 on the logical (n == 1) and replicate the rows with uncount

    library(dplyr)
    library(tidyr)
    df %>% 
     add_count(episode) %>% 
     mutate(n = (n == 1) + 1) %>%
     uncount(n) %>%
     group_by(episode) %>%
     mutate(time = row_number()) %>%
     ungroup
    

    -output

    # A tibble: 6 × 3
      episode score  time
      <chr>   <chr> <int>
    1 1       44        1
    2 1       12        2
    3 2       37        1
    4 2       37        2
    5 3       40        1
    6 3       9         2
    

    Or create the 'time' column first and then use complete with fill

    df %>% 
      group_by(episode) %>%
      mutate(time = row_number()) %>% 
      ungroup %>% 
      complete(episode, time) %>%
      fill(score)