Search code examples
rreshapereshape2meltcbind

reshape a dataframe R


I am facing a reshaping problem with a dataframe. It has many more rows and columns. Simplified, it structure looks like this:

rownames    year    x1  x2  x3
a           2000    2   6   11
b           2000    0   4   2
c           2000    0   3   5
a           2010    2   6   11
b           2010    0   0   0
c           2020    4   1   8
a           2020    10  1   7
b           2020    8   4   10
c           2020    22  1   16

I would like to come out with a dataframe that has one single row for the variable "year", copy the x1, x2, x3 values in subsequent columns, and rename the columns with a combination between the rowname and the x-variable. It should look like this:

year  a_x1  a_x2  a_x3  b_x1  b_x2  b_x3  c_x1  c_x2  c_x3
2000  2     6     11    0     4     2     0     3     5
2010  2     6     11    0     0     0     4     1     8
2020  10    1     7     8     4     10    22    1     16

I thought to use subsequent cbind() functions, but since I have to do it for thousands of rows and hundreds columns, I hope there is a more direct way with the reshape package (with which I am not so familiar yet)

Thanks in advance!


Solution

  • First, I hope that rownames is a data.frame column and not the data.frame's rownames. Otherwise you'll encounter problems due to the non-uniqueness of the values.

    I think your main problem is, that your data.frame is not entirely molten:

    library(reshape2)
    
    dt <- melt( dt, id.vars=c("year", "rownames") )
    head(dt)
       year rownames variable value
    1  2000        a       x1     2
    2  2000        b       x1     0
    3  2000        c       x1     0
    4  2010        a       x1     2
    ...
    
    dcast( dt, year ~ rownames + variable )
      year a_x1 a_x2 a_x3 b_x1 b_x2 b_x3 c_x1 c_x2 c_x3
    1 2000    2    6   11    0    4    2    0    3    5
    2 2010    2    6   11    0    0    0    4    1    8
    3 2020   10    1    7    8    4   10   22    1   16
    

    EDIT:

    As @spdickson points out, there is also an error in your data avoiding a simple aggregation. Combinations of year, rowname have to be unique of course. Otherwise you need an aggregation function which determines the resulting values of non-unique combinations. So we assume that row 6 in your data should read c 2010 4 1 8.