Search code examples
rdata.tablereshapemelt

Using melt with data.table only works with multiple measure variables


I'm confused by the behavior of data.table::melt().

I would like to reshape a data.table from wide to long, similar to this question

Here's my data.table:

dt <- data.table(id=c(1,2,3), varA1=c(2,6,1), varA2=c(1,1,1),varA3=c(1,2,3),  
                              varB1=c(1,0,1), varB2=c(1,1,1),varB3=c(0,0,0))

Here's what I want:

  id index varA
1:  1     1    2
2:  2     1    6
3:  3     1    1 
4:  1     2    1
5:  2     2    1
6:  3     2    1
7:  1     3    1
8:  2     3    2
9:  3     3    3

If I include both varA and varB as my measure vars, it works fine, using:

dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA","^varB"), value.name = c("varA","varB"),variable.name = "index")

The output is as expected:

   id index varA varB
1:  1     1    2    1
2:  2     1    6    0
3:  3     1    1    1
4:  1     2    1    1
5:  2     2    1    1
6:  3     2    1    1
7:  1     3    1    0
8:  2     3    2    0
9:  3     3    3    0

However, I only want varA. With only a single measure variable, it no longer works. Using

dt_long <- melt(dt, id.vars ="id", measure=patterns("^varA"), value.name = "varA",variable.name = "index")

I get:

   id index varA
1:  1 varA1    2
2:  2 varA1    6
3:  3 varA1    1
4:  1 varA2    1
5:  2 varA2    1
6:  3 varA2    1
7:  1 varA3    1
8:  2 varA3    2
9:  3 varA3    3

Why is the index variable here not listing 1,2,3, as before?


Solution

  • Using multiple columns with melt measure.vars is tricky.

    From melt help regarding measure.vars:

    multiple patterns will produce multiple columns

    If we apply data.table:::patterns source code to dt with "^varA","^varB", we get :

    cols <- colnames(dt)
    p = unlist(list("^varA","^varB"), use.names = any(nzchar(list("^varA","^varB"))))
    lapply(p, grep, cols)
    
    [[1]]
    [1] 2 3 4
    
    [[2]]
    [1] 5 6 7
    

    Leading to :

    • VarA1 associated with VarB1, factor index = 1
    • VarA2 associated with VarB2, factor index = 2
    • VarA3 associated with VarB3, factor index = 3

    Note that index number has nothing to do with VarXi index, it's simply an auto-numbering of VarAi-VarBi factor.

    For example, if we remove VarA1 we get following result:

       id index varA varB
    1:  1     1    1    1
    2:  2     1    1    0
    3:  3     1    1    1
    4:  1     2    1    1
    5:  2     2    2    1
    6:  3     2    3    1
    7:  1     3   NA    0
    8:  2     3   NA    0
    9:  3     3   NA    0
    
    • VarA2 associated with VarB1, factor index = 1
    • VarA3 associated with VarB2, factor index = 2
    • VarB3 alone, factor index = 3

    In both cases, as this factor is a composite factor, data.table returns its numeric index.

    When you only use one pattern, you directly get a factor with levels corresponding to this single pattern:

    dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA"), value.name = c("varA"),variable.name = "index")[]
    dt_long$index
    [1] varA1 varA1 varA1 varA2 varA2 varA2 varA3 varA3 varA3
    Levels: varA1 varA2 varA3
    

    You could convert it to numeric to get the expected result:

    dt_long[,index :=as.numeric(index)][]
       id index varA
    1:  1     1    2
    2:  2     1    6
    3:  3     1    1
    4:  1     2    1
    5:  2     2    1
    6:  3     2    1
    7:  1     3    1
    8:  2     3    2
    9:  3     3    3