Search code examples
rdata.tabledplyrmultidplyr

Restructuing and formatting data frame columns


dfin <- 

ID   SEQ   GRP   C1   C2   C3   T1   T2   T3
1     1     1    0     5    8   0     1   2
1     2     1    5     10   15  5     6   7
2     1     2    20    25   30  0     1   2

C1 is the concentration (CONC) at T1 (TIME) and so on. This is what I want as an output:

dfout <- 

ID   SEQ   GRP  CONC  TIME
1     1     1    0     0
1     1     1    5     1
1     1     1    8     2
1     2     1    5     5
1     2     1    10    6
1     2     1    15    7
2     1     2    20    0
2     1     2    25    1
2     1     2    30    2

The dfin has much more columns for Cx and Tx where x is the number of concentration readings.


Solution

  • You can do this with data.table::melt, with its capability of melting the table into multiple columns based on the columns pattern:

    library(data.table)
    melt(
        setDT(df), 
        id.vars=c("ID", "SEQ", "GRP"), 
        # columns starts with C and T should be melted into two separate columns
        measure.vars=patterns("^C", "^T"),     
        value.name=c('CONC', 'TIME')
    )[order(ID, SEQ)][, variable := NULL][]
    
    #   ID SEQ GRP CONC TIME
    #1:  1   1   1    0    0
    #2:  1   1   1    5    1
    #3:  1   1   1    8    2
    #4:  1   2   1    5    5
    #5:  1   2   1   10    6
    #6:  1   2   1   15    7
    #7:  2   1   2   20    0
    #8:  2   1   2   25    1
    #9:  2   1   2   30    2
    

    Or if the value column names follow the pattern [CT][0-9], you can use reshape from base R by specifying the sep="" which will split the value columns name by the letter/digit separation due to this default setting (from ?reshape):

    split = if (sep == "") {
                list(regexp = "[A-Za-z][0-9]", include = TRUE)
            } else {
                list(regexp = sep, include = FALSE, fixed = TRUE)}
    
    reshape(df, varying=-(1:3), idvar=c("ID", "SEQ", "GRP"), 
            dir="long", sep="", v.names=c("CONC", "TIME"))
    
    #   ID SEQ GRP time CONC TIME
    #1:  1   1   1    1    0    5
    #2:  1   2   1    1    5   10
    #3:  2   1   2    1   20   25
    #4:  1   1   1    2    8    0
    #5:  1   2   1    2   15    5
    #6:  2   1   2    2   30    0
    #7:  1   1   1    3    1    2
    #8:  1   2   1    3    6    7
    #9:  2   1   2    3    1    2