Search code examples
rdataframedplyrdata.tabledata-cleaning

adding rows by group to get same number of observations by group


I have what seems like a pretty simple question, but I haven't been able to successfully adapt solutions to similar ones to my situation, including this one: Add row for each group with missing value

I have some data that looks like this:

# A tibble: 265 x 4
   anon_ID assistance_date Benefit_1 nth_assistance_interaction
     <int> <chr>           <chr>                          <dbl>
 1       8 2020-04-10      Medical                            5
 2       8 2020-04-13      Medical                           10
 3       8 2020-04-15      Medical                           15
 4       8 2020-04-21      Medical                           20
 5      11 2020-06-17      Housing                            5
 6      11 2020-06-25      Financial                         10
 7      11 2021-01-27      Financial                         15
 8      26 2020-05-18      Legal                              5
 9      26 2021-06-01      Food                              10
10      26 2021-08-02      Utilities                         15
# ... with 255 more rows

I want to modify it so that each anon_ID has four observations, one for each unique value of nth_assistance_interaction. The values of assistance_date and Benefit_1 should be NA when real values for these variables don't exist.

e.g., for anon_ID = 11, these two variables would have NA values when nth_assistance_interaction = 20.

# A tibble: 265 x 4
   anon_ID assistance_date Benefit_1 nth_assistance_interaction
     <int> <chr>           <chr>                          <dbl>
 1       8 2020-04-10      Medical                            5
 2       8 2020-04-13      Medical                           10
 3       8 2020-04-15      Medical                           15
 4       8 2020-04-21      Medical                           20
 5      11 2020-06-17      Housing                            5
 6      11 2020-06-25      Financial                         10
 7      11 2021-01-27      Financial                         15
 8      11 NA              NA                                20
 9      26 2020-05-18      Legal                              5
10      26 2021-06-01      Food                              10
11      26 2021-08-02      Utilities                         15
# ... with 255 more rows

This is just one example of what I'm trying to accomplish. It could also be the case that anon_ID = 27 only has one observation for nth_assistance_interaction, and so I would need to add three rows for them.

How can I go about making this happen? Thanks in advance.


Solution

  • We may group by 'anon_ID' and use complete to expand the data

    library(dplyr)
    library(tidyr)
    df1 %> 
      group_by(anon_ID) %>% 
      complete(nth_assistance_interaction = c(5, 10, 15, 20)) %>% 
      ungroup