Search code examples
rdataframedplyrmagrittr

How to set the row names of a data frame passed on with the pipe %>% operator?


I have a data frame which I am dcasting using the reshape2 package, and I would like to remove the first column and have it become the row names of the data frame instead.

Original dataframe, before dcast:

> corner(df)

ID_full      gene cpm
1  S36-A1   DDX11L1   0
2  S36-A1    WASH7P   0
3  S36-A1 MIR1302-2   0
4  S36-A1   FAM138A   0
5  S36-A1     OR4F5   0

pivot function to dcast the table:

 library(reshape2)

 pivot <- function(x){
             castTable <- x %>% dcast(ID_full ~ gene, value.var="cpm")
             }

After dcast, wrapped in my pivot function:

> corner(df)

ID_full 1060P11.3 A1BG A1BG-AS1 A1CF
1  S36-A1         0    0        0    0
2 S36-A10         0    0        0    0
3 S36-A11         0    0        0    0
4 S36-A12         0    0        0    0
5  S36-A2         0    0        0    0

I'd like ID_full to become the rownames, and to cease existing as a column, piped after dcasting. I can do this in several lines, replacing the data frame each time, but I'd like to do it all using the %>% operator.

The best attempt I can think of would involve something like this, but obviously it doesn't work:

library(dplyr)

df <- df %>% pivot(.) %>% with(., row.names=df[,1])

I'd appreciate any suggestions... this nuisance is driving me crazy!

UPDATE:

Thanks for your answers:

This expression works nicely:

df <- df %>% pivot(.) %>% `rownames<-`(.[,1]) %>% select(-ID_full)

> corner(df)

        1060P11.3 A1BG A1BG-AS1 A1CF        A2M
S36-A1          0    0        0    0    0.00000
S36-A10         0    0        0    0    0.00000
S36-A11         0    0        0    0    0.00000
S36-A12         0    0        0    0    1.62189
S36-A2          0    0        0    0 1170.95000

Solution

  • with the later version of tibble, a more elegant solution exists:

    df <- df %>% pivot(.) %>% tibble::column_to_rownames('ID_full')
    

    Importantly, it works also when the column to turn to the rowname is passed as a variable, which is super-convenient, when inside the function!