Search code examples
rdata-manipulationdata-cleaning

How to duplicate responses in rows with specific matching variables in R


My dataset looks something like this:

ID     block     question     latency     response          rating
425      1          1          3452      my response           3
425      1          2          6427                            1
425      1          3          4630                            5
425      1          4          5319                            2
425      1          5          2501                            2
425      2          1          4205    another response        4

Not all participants completed the same number of blocks/questions.

I just want to copy the responses down where there are empty 'response' cells and where both the ID and block number match those for the existing response, like this:

ID     block     question     latency     response          rating
425      1          1          3452      my response           3
425      1          2          6427      my response           1
425      1          3          4630      my response           5
425      1          4          5319      my response           2
425      1          5          2501      my response           2
425      2          1          4205    another response        4

Solution

  • You can try something like this. It first replaces the missing strings with NAs to then fill the values with fill.

    library(dplyr)
    library(tidyr)
    
    df %>% 
       group_by(ID,block) %>% 
       mutate(response=ifelse(response=="",NA,response)) %>% 
       fill(response, .direction="down") %>% 
       ungroup()
    # A tibble: 6 x 6
         ID block question latency response         rating
      <dbl> <dbl>    <dbl>   <dbl> <chr>             <int>
    1   425     1        1    3452 my response           3
    2   425     1        2    6427 my response           1
    3   425     1        3    4630 my response           5
    4   425     1        4    5319 my response           2
    5   425     1        5    2501 my response           2
    6   425     2        1    4205 another response      4
    

    Data

    df <- structure(list(ID = c(425, 425, 425, 425, 425, 425), block = c(1, 
    1, 1, 1, 1, 2), question = c(1, 2, 3, 4, 5, 1), latency = c(3452, 
    6427, 4630, 5319, 2501, 4205), response = c("my response", "", 
    "", "", "", "another response"), rating = c(3L, 1L, 5L, 2L, 2L, 
    4L)), row.names = c(NA, -6L), class = "data.frame")