Search code examples
rdplyrtidyrdata-manipulation

Fill empty rows with values from other rows


I have a dataset with a number of cases. Every case has two observations. The first observation for case number 1 has value 3 and the second observation has value 7. The two observations for case number 2 have missing values. I need to write code to fill the empty cells with the same values from case number 1 so that the first row for case 2 will have the same value as case 1 for obs = 1 and the second row will have the same value for obs = 2. Of course, this is a very short version of a much bigger dataset so I need something that is flexible enough to accommodate for a couple of hundred cases and where the values to use as fillers change for every subjects.

Here is a toy data set:

# toy dataset
df <- data.frame(
  case = c(1, 1, 2, 2),
  obs = c(1, 2, NA, NA),
  value = c(3, 7, NA, NA)
)

 #   case obs  value
 # 1    1   1      3
 # 2    1   2      7
 # 3    2   NA    NA
 # 4    2   NA    NA

#Desired output:

  case obs value
1    1   1     3
2    1   2     7
3    2   1     3
4    2   2     7

Solution

  • We may use fill with grouping on the row sequence (rowid) of case

    library(dplyr)
    library(data.table)
    library(tidyr)
    df %>% 
      group_by(grp = rowid(case)) %>%
      fill(obs, value) %>%
      ungroup %>%
      select(-grp)
    

    -output

    # A tibble: 4 × 3
       case   obs value
      <dbl> <dbl> <dbl>
    1     1     1     3
    2     1     2     7
    3     2     1     3
    4     2     2     7