Search code examples
radditionrowslme4longitudinal

Adding rows to make a full long dataset for longitudinal data analysis


I am working with a long-format longitudinal dataset where each person has 1, 2 or 3 time points. In order to perform certain analyses I need to make sure that each person has the same number of rows even if it consists of NAs because they did not complete the certain time point.

Here is a sample of the data before adding the rows:

structure(list(Values = c(23, 24, 45, 12, 34, 23), P_ID = c(1, 
1, 2, 2, 2, 3), Event_code = c(1, 2, 1, 2, 3, 1), Site_code = c(1, 
1, 3, 3, 3, 1)), class = "data.frame", row.names = c(NA, -6L))


This is the data I aim to get after adding the relevant rows:


structure(list(Values = c(23, 24, NA, 45, 12, 34, 23, NA, NA), 
P_ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Event_code = c(1, 2, 
3, 1, 2, 3, 1, 2, 3), Site_code = c(1, 1, 1, 3, 3, 3, 1, 
1, 1)), class = "data.frame", row.names = c(NA, -9L))

I want to come up with code that would automatically add rows to the dataset conditionally on whether the participant has had 1, 2 or 3 visits. Ideally it would make rest of data all NAs while copying Participant_ID and site_code but if not possible I would be satisfied just with creating the right number of rows.


Solution

  • We could use fill after doing a complete

    library(dplyr)
    library(tidyr)
    ExpandedDataset %>% 
          complete(P_ID, Event_code) %>%
          fill(Site_code)