Search code examples
rdata-cleaning

cleaning time series based on previous timepoints


In my clincal dataset, I have a unique identifors by patient ID and time, and then the variable of interest that look like so:

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,NA,2,1,NA,1,3,NA,NA,1,NA)

Data <- data.frame(patientid=patientid, time=time, V1=V1)

Timepoint 3 is blank for each patient. I want to fill in timepoint three for each patient based on the following criteria. If at either time point 1 or 2 the variable is coded as a 2 or 3 then time point 3 should be coded as a 2. If at both time point 1 and 2, variable is coded as a 1 then time point point 3 should be coded as a one. If there is missing data at time point 1 or 2 then time point three should be missing. So for the toy expample it should look like this:

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,1,2,1,2,1,3,2,NA,1,NA)

Data <- data.frame(patientid=patientid, time=time, V1=V1)

Solution

  • This should do it!

    library(tidyverse)
    
    patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
    time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
    V1 <- c(1,1,NA,2,1,NA,1,3,NA,NA,1,NA)
    
    Data <- data.frame(patientid=patientid, time=time, V1=V1)
    
    Data <- Data %>% pivot_wider(names_from = "time", values_from = "V1", 
                                 names_prefix = "timepoint_")
    
    timepoint_impute <- function(x,y) {
      if(is.na(x) | is.na(y)) {
        return(NA)
      } else if(2 %in% c(x,y) | 3 %in% c(x,y)) {
        return(2)
      } else if(x==1 & y==1) {
        return(1)
      }
    }
    
    Data$timepoint_3 <- map2(.x = Data$timepoint_1, .y = Data$timepoint_2,
                              .f = timepoint_impute)
    

    You end up with wide data format but if you need long data format you can just use tidyr::pivot_longer. This approach writes a custom function to handle the logic you need.