I'm confused by the behavior of data.table::melt()
.
I would like to reshape a data.table from wide to long, similar to this question
Here's my data.table:
dt <- data.table(id=c(1,2,3), varA1=c(2,6,1), varA2=c(1,1,1),varA3=c(1,2,3),
varB1=c(1,0,1), varB2=c(1,1,1),varB3=c(0,0,0))
Here's what I want:
id index varA
1: 1 1 2
2: 2 1 6
3: 3 1 1
4: 1 2 1
5: 2 2 1
6: 3 2 1
7: 1 3 1
8: 2 3 2
9: 3 3 3
If I include both varA
and varB
as my measure vars, it works fine, using:
dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA","^varB"), value.name = c("varA","varB"),variable.name = "index")
The output is as expected:
id index varA varB
1: 1 1 2 1
2: 2 1 6 0
3: 3 1 1 1
4: 1 2 1 1
5: 2 2 1 1
6: 3 2 1 1
7: 1 3 1 0
8: 2 3 2 0
9: 3 3 3 0
However, I only want varA
. With only a single measure variable, it no longer works. Using
dt_long <- melt(dt, id.vars ="id", measure=patterns("^varA"), value.name = "varA",variable.name = "index")
I get:
id index varA
1: 1 varA1 2
2: 2 varA1 6
3: 3 varA1 1
4: 1 varA2 1
5: 2 varA2 1
6: 3 varA2 1
7: 1 varA3 1
8: 2 varA3 2
9: 3 varA3 3
Why is the index
variable here not listing 1,2,3, as before?
Using multiple columns with melt
measure.vars
is tricky.
From melt
help regarding measure.vars
:
multiple patterns will produce multiple columns
If we apply data.table:::patterns
source code to dt
with "^varA","^varB"
, we get :
cols <- colnames(dt)
p = unlist(list("^varA","^varB"), use.names = any(nzchar(list("^varA","^varB"))))
lapply(p, grep, cols)
[[1]]
[1] 2 3 4
[[2]]
[1] 5 6 7
Leading to :
VarA1
associated with VarB1
, factor index = 1VarA2
associated with VarB2
, factor index = 2VarA3
associated with VarB3
, factor index = 3Note that index number has nothing to do with VarXi index, it's simply an auto-numbering of VarAi-VarBi
factor.
For example, if we remove VarA1
we get following result:
id index varA varB
1: 1 1 1 1
2: 2 1 1 0
3: 3 1 1 1
4: 1 2 1 1
5: 2 2 2 1
6: 3 2 3 1
7: 1 3 NA 0
8: 2 3 NA 0
9: 3 3 NA 0
VarA2
associated with VarB1
, factor index = 1VarA3
associated with VarB2
, factor index = 2VarB3
alone, factor index = 3In both cases, as this factor is a composite factor, data.table
returns its numeric
index.
When you only use one pattern
, you directly get a factor with levels
corresponding to this single pattern
:
dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA"), value.name = c("varA"),variable.name = "index")[]
dt_long$index
[1] varA1 varA1 varA1 varA2 varA2 varA2 varA3 varA3 varA3
Levels: varA1 varA2 varA3
You could convert it to numeric
to get the expected result:
dt_long[,index :=as.numeric(index)][]
id index varA
1: 1 1 2
2: 2 1 6
3: 3 1 1
4: 1 2 1
5: 2 2 1
6: 3 2 1
7: 1 3 1
8: 2 3 2
9: 3 3 3