first of all, I am sorry, I know that my question overlaps with couple of other questions here, but I am really new to R and I could not find an answer that fills the task completely and that I could actually understand.
my data set is something like that:
ID … Exam_t_minus_3 Exam_t_minus_2 Exam_t_minus_1 Grade_2012 Grade_2013 Grade_2014
1 Math Physics Chemestry 98 70 76
2 English French Russian 77 85 59
3 English Chemistry Biology 65 77 69
and I want it to become something like that:
ID … i(as t_minus_i_) Exam Grade
1 3 Math 98
1 2 Physics 70
1 1 Chemistry 76
2 3 English 77
2 2 French 85
2 1 Russian 59
I have gone through swirl() course and thought I could do it with tidyr, but I could not figure out how could I gather more than one set of columns. I ended up with a result like:
ID … ................ Exam Grade
1 . Math 98
1 . Math 70
1 . Math 76
1 . Physics 98
1 . Physics 70
1 . Physics 76
I looked into this: Gather multiple sets of columns and some others, but most of them lost me on half way.
I tried the answer for this question as well (I changed the earning columns to end with T_minus_* beforehand): Gather multiple columns with tidyr
i.e I tried
library(data.table)
res = melt(setDT(sample_df),
measure.vars = patterns("^Exam", "^Grade"),
variable.name = "i")
res[, i := factor(i, labels = c("3","2", "1"))]
It does the most of the thing I need, however, I need to the values in my i columns to be integers not categorical variables, because I need them for the future calculations.
I tried to do res$i <- as.numeric(res$i)
, but that changed the order, i.e "3" was evaluated to 1, "1" to 3.
I tried to just leave it out, but that gives me 1,2,3 as well for the i column values.
As I changed the earnings columns to be Earnings_T_minus_* could I perhaps just get those * values in the i column somehow?
library(data.table)
res = melt(setDT(sample_df),
measure.vars = patterns("^Exam_T_minus_*", "^Grade_T_minus_*"),
variable.name = "i")
Sorry, the question became a bit long and perhaps confusing, but hope someone can lead me in right direction.
Your data.table
approach was nearly spot on. Reshaping with multiple columns is the way to go.
library(data.table)
melt(setDT(sample_df),
measure.vars = patterns("^Exam", "^Grade"), value.name = c("Exam", "Grade"),
variable.name = "i", variable.factor = FALSE)[
, i := 4L - as.integer(i)][order(ID)]
ID i Exam Grade 1: 1 3 Math 98 2: 1 2 Physics 70 3: 1 1 Chemistry 76 4: 2 3 English 77 5: 2 2 French 85 6: 2 1 Russian 59 7: 3 3 English 65 8: 3 2 Chemistry 77 9: 3 1 Biology 69
The only modifications are to pass variable.factor = FALSE
as parameter to melt()
, to do some arithmetic on i
after coercion to integer and to order()
the result appropriately.