Search code examples
arraysrtidyrreshape2

How to replace reshape2::melt for an array with tidyr?


I would like to convert a matrix/array (with dimnames) into a data frame. This can be done very easily using reshape2::melt but seems harder with tidyr, and in fact not really possible in the case of an array. Am I missing something? (In particular since reshape2 describes itself as being retired; see https://github.com/hadley/reshape).

For example, given the following matrix

MyScores <- matrix(runif(2*3), nrow = 2, ncol = 3, 
                   dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3]))

we can turn it into a data frame as follows

reshape2::melt(MyScores, value.name = 'Score') # perfect

or, using tidyr as follows:

as_tibble(MyScores, rownames = 'Month') %>% 
  gather(Class, Score, -Month)

In this case reshape2 and tidyr seem similar (although reshape2 is shorter if you are looking for a long-format data frame).

However for arrays, it seems harder. Given

EverybodyScores <- array(runif(2*3*5), dim = c(2,3,5), 
                         dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5))

we can turn it into a data frame as follows:

reshape2::melt(EverybodyScores, value.name = 'Score') # perfect

but using tidyr it's not clear how to do it:

as_tibble(EverybodyScores, rownames = 'Month') # looses month information and need to distange Class and StudentID

Is this a situation where the right solution is to stick to using reshape2?


Solution

  • One way I just found by playing around is to coerce via tbl_cube. I have never really used the class but it seems to do the trick in this instance.

    EverybodyScores <- array(
      runif(2 * 3 * 5),
      dim = c(2, 3, 5),
      dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5)
    )
    library(tidyverse)
    library(cubelyr)
    EverybodyScores %>%
      as.tbl_cube(met_name = "Score") %>%
      as_tibble
    #> # A tibble: 30 x 4
    #>    Month    Class StudentID Score
    #>    <chr>    <chr>     <int> <dbl>
    #>  1 January  A             1 0.366
    #>  2 February A             1 0.254
    #>  3 January  B             1 0.441
    #>  4 February B             1 0.562
    #>  5 January  C             1 0.313
    #>  6 February C             1 0.192
    #>  7 January  A             2 0.799
    #>  8 February A             2 0.277
    #>  9 January  B             2 0.631
    #> 10 February B             2 0.101
    #> # ... with 20 more rows
    

    Created on 2018-08-15 by the reprex package (v0.2.0).