Search code examples
rreshape

Transform multi-columns data in wide to long format


I have this data

dw <- structure(list(ccssid = c(1000023L, 1000043L), base.age = c("22", 
"27"), fu1.age = c("30", "35"), fu2.age = c("33", "37"), fu3.age = c("35", 
"40"), fu7.age = c("38", "42"), fu5.age = c("44", "49"), fu6.age = c("48", 
"52"), base.bmi = c("25.1421", "21.6333"), fu2.bmi = c("25.7959", 
"23.5078"), fu7.bmi = c("25.105", "24.961"), fu5.bmi = c("24.366", 
"24.961"), fu2.MET = c("150", "0"), fu2.CDC = c("Yes", "No"), 
    fu5.MET = c("360", "120"), fu5.CDC = c("Yes", "No"), fu6.MET = c(NA_character_, 
    NA_character_), fu6.CDC = c(NA_character_, NA_character_), 
    base.smk = c(NA, "1"), fu2.smk = c("2", "1"), fu7.smk = c("2", 
    "1"), fu5.smk = c("2", "1"), base.riskydrk = c(NA, "No"), 
    fu7.riskydrk = c("Yes", "Yes"), fu5.riskydrk = c("No", "No"
    ), base.MET = c(NA, NA), base.CDC = c(NA, NA), fu1.bmi = c(NA, 
    NA), fu1.MET = c(NA, NA), fu1.CDC = c(NA, NA), fu1.smk = c(NA, 
    NA), fu1.riskydrk = c(NA, NA), fu2.riskydrk = c(NA, NA), 
    fu3.bmi = c(NA, NA), fu3.MET = c(NA, NA), fu3.CDC = c(NA, 
    NA), fu3.smk = c(NA, NA), fu3.riskydrk = c(NA, NA), fu7.MET = c(NA, 
    NA), fu7.CDC = c(NA, NA), fu6.bmi = c(NA, NA), fu6.smk = c(NA, 
    NA), fu6.riskydrk = c(NA, NA)), row.names = 1:2, class = "data.frame")

I tried this code below to transform this data, but I am not sure why the values of riskydrk column are switched with smk column in the output. The smk column should have values 1,2, but somehow, it is switched with the values of riskydrk column. Can someone please help me figure out the issue? Thanks!

reshape(dw, direction='long', 
        varying=c('base.age', 'base.bmi', "base.MET", 'base.CDC', "base.smk", "base.riskydrk",
                  'fu1.age', 'fu1.bmi', "fu1.MET", 'fu1.CDC', "fu1.smk", "fu1.riskydrk",
                  'fu2.age', 'fu2.bmi', "fu2.MET", 'fu2.CDC', "fu2.smk", "fu2.riskydrk",
                  'fu3.age', 'fu3.bmi', "fu3.MET", 'fu3.CDC', "fu3.smk", "fu3.riskydrk",
                  'fu7.age', 'fu7.bmi', "fu7.MET", 'fu7.CDC', "fu7.smk", "fu7.riskydrk",
                  'fu5.age', 'fu5.bmi', "fu5.MET", 'fu5.CDC', "fu5.smk", "fu5.riskydrk",
                  'fu6.age', 'fu6.bmi', "fu6.MET", 'fu6.CDC', "fu6.smk", "fu6.riskydrk"),
        timevar='var',
        times=c('base', 'fu1', 'fu2', 'fu3', 'fu7', 'fu5', 'fu6'),
        v.names=c('age', 'bmi', 'MET', 'CDC', 'smk', 'riskydrk'),
        idvar='ccssid')

The desired output should be like this:

enter image description here


Solution

  • I think your problem is, that your times should be suffixes, so reshape is able to guess, which would be way easier.

    ## make prefixes to suffixes
    names(dw) <- strsplit(names(dw), '\\.') |> lapply(rev) |> sapply(paste, collapse='.')
    
    reshape(dw, direction='l', idvar='ccssid', varying=sort(names(dw)[-1]))
    #               ccssid time age     bmi  CDC  MET riskydrk  smk
    # 1000023.base 1000023 base  22 25.1421 <NA> <NA>     <NA> <NA>
    # 1000043.base 1000043 base  27 21.6333 <NA> <NA>       No    1
    # 1000023.fu1  1000023  fu1  30    <NA> <NA> <NA>     <NA> <NA>
    # 1000043.fu1  1000043  fu1  35    <NA> <NA> <NA>     <NA> <NA>
    # 1000023.fu2  1000023  fu2  33 25.7959  Yes  150     <NA>    2
    # 1000043.fu2  1000043  fu2  37 23.5078   No    0     <NA>    1
    # 1000023.fu3  1000023  fu3  35    <NA> <NA> <NA>     <NA> <NA>
    # 1000043.fu3  1000043  fu3  40    <NA> <NA> <NA>     <NA> <NA>
    # 1000023.fu5  1000023  fu5  44  24.366  Yes  360       No    2
    # 1000043.fu5  1000043  fu5  49  24.961   No  120       No    1
    # 1000023.fu6  1000023  fu6  48    <NA> <NA> <NA>     <NA> <NA>
    # 1000043.fu6  1000043  fu6  52    <NA> <NA> <NA>     <NA> <NA>
    # 1000023.fu7  1000023  fu7  38  25.105 <NA> <NA>      Yes    2
    # 1000043.fu7  1000043  fu7  42  24.961 <NA> <NA>      Yes    1