Search code examples
rstringmultiple-columnslong-format-data

In R, how to combine certain strings from multiple columns to be arranged in a long format column by Time within ID?


In R, I would like to select certain strings (A2, D1) from multiple columns (dx1-dx3) and put them in a column in a long format by id while keeping another column (i.e., Time). Below is the data frame.

id  dx1 dx2 dx3 Time
1   A2      D1  1
1   B1  A1  D1  2
2   D1          1
2   A3  D1      2
2   A2  D1      3

The format I want is as follows:

id  name value     Time
1   dx1   A2        1   
1   dx3   D1        1
1   dx3   D1        2
2   dx1   D1        1
2   dx2   D1        2
2   dx1   A2        3
2   dx2   D1        3

I tried d %>% select(id, contains('dx'), Time) %>% pivot_longer(cols = -c('id')) %>% filter(str_detect(value, 'A2|D1')). I got the results without the Time column. Do you have any suggestions on how to keep the Time column?


Solution

  • Try reshape2::melt and then subset the rows:

    library(reshape2)
    df = data.frame(id=c(1,1,2,2,2),
                    dx1=c('A2','B1','D1','A3','A2'),
                    dx2=c(NA, 'A1', NA, 'D1', 'D1'),
                    dx3=c('D1','D1',NA, NA, NA),
                    Time=c(1,2,1,2,3))
    df2 = melt(df, measure.vars=c("dx1","dx2","dx3"), id.vars=c("id","Time"), variable.name="name")
    df2 = df2[df2$value %in% c('A2','D1'),]
    df2
    
       id Time name value
    1   1    1  dx1    A2
    3   2    1  dx1    D1
    5   2    3  dx1    A2
    9   2    2  dx2    D1
    10  2    3  dx2    D1
    11  1    1  dx3    D1
    12  1    2  dx3    D1