Search code examples
rdplyrspreadlongitudinal

Convert data on pre-post repeated measures from long to wide by filtering data to get time point as value


I have a 14K row table of 370 liver transplant patients with transplant date and various repeated lab tests done before and after the procedure. I want to get pre-transplant, immediate post-transplant, and 3/6/12/18/24/36 month lab results.

ID Transp Date Lab Units Lab Type Tme Lab Val
0000001 2011-01-11 VCA IgG Index 0 6487.0
0000001 2011-01-11 VCA IgM Index 0 11230.0
0000002 2011-01-03 Copies/mL CMV Quant PCR 3 100.0
0000002 2011-01-03 Copies/mL EBV Quant PCR 3 683.0.

I did round(datediff) of transplant date and lab test date to get the month timepoint (Tme). My client wants the final table to have one record and all data values per row. Headers something like this:

ID|TrnsplDate|LabType1|Units1|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType2|Units2|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType3|Units3|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType4|Units4|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36

Can anyone knowledgeable in R guide me on where to start? I use Rstudio. Thanks in advance.


Solution

  • Try this, which will put everything for the same ID on one line, then you can adjust column names and order as needed using colnames(df) and indexing (i.e., something like colorder <- c(2,3,5,1,7,12,...); df[, colorder].

    ### Set up data
    library(lubridate)
    df <- data.frame(ID = rep(sprintf("SID%s",seq(1:2)),2),
                     transdate = seq(mdy("01/01/2000"), mdy("01/4/2000"),1),
                     labunits = c(NA, NA, rep("Copies/mL",2)),
                     labtype = c(rep("VCA IgG Index",2),"CMV Quant PCR", "EBV Quant PCR"),
                     time = c(0,0,2,2),
                     labval = sample(100:2000, 4))
    # Transform
    df2 <- tidyr::pivot_wider(df, names_from = labtype, values_from = -ID)
    
    # ----------------------------
    # Edit: separate by lab type
    df_bylab <- split(df, df$labtype)
    
    # output each lab type to CSV 
    for(i in 1:length(df)){
      write.csv(df[i], paste0(names(df)[i], ".csv"))
    }
    
    

    Based on the limited data provided, I am not sure if some columns could be collapsed (for instance, it seems like you only have one date per SID, so all the date columns could be collapsed to one column.

    I would also like to point out this is not an ideal structure for the data, so perhaps your best bet is to try to convince your client otherwise!