I have a 14K row table of 370 liver transplant patients with transplant date and various repeated lab tests done before and after the procedure. I want to get pre-transplant, immediate post-transplant, and 3/6/12/18/24/36 month lab results.
ID | Transp Date | Lab Units | Lab Type | Tme | Lab Val |
---|---|---|---|---|---|
0000001 | 2011-01-11 | VCA IgG Index | 0 | 6487.0 | |
0000001 | 2011-01-11 | VCA IgM Index | 0 | 11230.0 | |
0000002 | 2011-01-03 | Copies/mL | CMV Quant PCR | 3 | 100.0 |
0000002 | 2011-01-03 | Copies/mL | EBV Quant PCR | 3 | 683.0. |
I did round(datediff) of transplant date and lab test date to get the month timepoint (Tme). My client wants the final table to have one record and all data values per row. Headers something like this:
ID|TrnsplDate|LabType1|Units1|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType2|Units2|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType3|Units3|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36|LabType4|Units4|PreVal|Val0|Val3|Val6|Val12|Val18|Val24|Val36
Can anyone knowledgeable in R guide me on where to start? I use Rstudio. Thanks in advance.
Try this, which will put everything for the same ID on one line, then you can adjust column names and order as needed using colnames(df)
and indexing (i.e., something like colorder <- c(2,3,5,1,7,12,...); df[, colorder]
.
### Set up data
library(lubridate)
df <- data.frame(ID = rep(sprintf("SID%s",seq(1:2)),2),
transdate = seq(mdy("01/01/2000"), mdy("01/4/2000"),1),
labunits = c(NA, NA, rep("Copies/mL",2)),
labtype = c(rep("VCA IgG Index",2),"CMV Quant PCR", "EBV Quant PCR"),
time = c(0,0,2,2),
labval = sample(100:2000, 4))
# Transform
df2 <- tidyr::pivot_wider(df, names_from = labtype, values_from = -ID)
# ----------------------------
# Edit: separate by lab type
df_bylab <- split(df, df$labtype)
# output each lab type to CSV
for(i in 1:length(df)){
write.csv(df[i], paste0(names(df)[i], ".csv"))
}
Based on the limited data provided, I am not sure if some columns could be collapsed (for instance, it seems like you only have one date per SID, so all the date columns could be collapsed to one column.
I would also like to point out this is not an ideal structure for the data, so perhaps your best bet is to try to convince your client otherwise!