Search code examples
rtidyrmelt

Gathering/ melting multiple column sets, integer values for the variable column?


first of all, I am sorry, I know that my question overlaps with couple of other questions here, but I am really new to R and I could not find an answer that fills the task completely and that I could actually understand.

my data set is something like that:

ID … Exam_t_minus_3 Exam_t_minus_2 Exam_t_minus_1 Grade_2012 Grade_2013 Grade_2014
1       Math        Physics         Chemestry         98         70         76
2       English     French          Russian           77         85         59
3       English     Chemistry       Biology           65         77         69

and I want it to become something like that:

ID  …   i(as t_minus_i_)    Exam         Grade
1       3                   Math          98
1       2                   Physics       70
1       1                   Chemistry     76
2       3                   English       77
2       2                   French        85
2       1                   Russian       59

I have gone through swirl() course and thought I could do it with tidyr, but I could not figure out how could I gather more than one set of columns. I ended up with a result like:

ID  …   ................    Exam         Grade
1       .                   Math          98
1       .                   Math          70
1       .                   Math          76
1       .                   Physics       98
1       .                   Physics       70
1       .                   Physics       76

I looked into this: Gather multiple sets of columns and some others, but most of them lost me on half way.

I tried the answer for this question as well (I changed the earning columns to end with T_minus_* beforehand): Gather multiple columns with tidyr

i.e I tried

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam", "^Grade"), 
variable.name = "i")
res[, i := factor(i, labels = c("3","2", "1"))]

It does the most of the thing I need, however, I need to the values in my i columns to be integers not categorical variables, because I need them for the future calculations.

I tried to do res$i <- as.numeric(res$i), but that changed the order, i.e "3" was evaluated to 1, "1" to 3. I tried to just leave it out, but that gives me 1,2,3 as well for the i column values.

As I changed the earnings columns to be Earnings_T_minus_* could I perhaps just get those * values in the i column somehow?

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam_T_minus_*", "^Grade_T_minus_*"), 
variable.name = "i")

Sorry, the question became a bit long and perhaps confusing, but hope someone can lead me in right direction.


Solution

  • Your data.table approach was nearly spot on. Reshaping with multiple columns is the way to go.

    library(data.table)
    melt(setDT(sample_df), 
         measure.vars = patterns("^Exam", "^Grade"), value.name = c("Exam", "Grade"), 
         variable.name = "i", variable.factor = FALSE)[
           , i := 4L - as.integer(i)][order(ID)]
    
       ID i      Exam Grade
    1:  1 3      Math    98
    2:  1 2   Physics    70
    3:  1 1 Chemistry    76
    4:  2 3   English    77
    5:  2 2    French    85
    6:  2 1   Russian    59
    7:  3 3   English    65
    8:  3 2 Chemistry    77
    9:  3 1   Biology    69
    

    The only modifications are to pass variable.factor = FALSE as parameter to melt(), to do some arithmetic on i after coercion to integer and to order() the result appropriately.