Search code examples
rdataframeleft-joinr-rownames

How to name/title rownames in R


I have a dataframe named 'res', where the row names are numbers corresponding to genes.

>res

        baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am wondering how to give the row names of my dataframe the heading 'gene_id' so that my data frame ends up looking like this.


>res
gene_id baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am planning to bind this dataframe with another dataframe (anno) containing information of the actual genes, by the 'gene_id' column using the left_join function.

>anno
   gene_id  SYMBOL                                                                     GENENAME
1    27395  Mrpl15                                          mitochondrial ribosomal protein L15
2    18777  Lypla1                                                          lysophospholipase 1
3    21399   Tcea1                                    transcription elongation factor A (SII) 1

res_anno <- left_join(res, anno,by="gene_id")


Solution

  • Is this what you're looking for?

    Creating two dataframes that represent the example:

    library(tidyverse)
    
    # creating the res dataframe
    res = tibble(
      baseMean = c(1268.40,1413.56,3376.09),
      log2FoldChange = c(0.100013,-0.266365,-0.243707)
    )
    
    # A tibble: 3 × 2
      baseMean log2FoldChange
         <dbl>          <dbl>
    1    1268.          0.100
    2    1414.         -0.266
    3    3376.         -0.244
    
    
    # creating the anno dataframe
    anno = tibble(
      gene_id = c(1,2,3),
      SYMBOL = c('Mrpl15', 'Lypla1', 'Tcea1')
    )
    
    # A tibble: 3 × 2
      gene_id SYMBOL
        <dbl> <chr> 
    1       1 Mrpl15
    2       2 Lypla1
    3       3 Tcea1
    
    
    

    Then you can apply this to your dataset:

    # extracting the rownames and putting them in a column
    res = res %>% 
      rownames_to_column('gene_id') %>% 
      mutate(gene_id = gene_id %>% as.numeric())
    
    # A tibble: 3 × 3
      gene_id baseMean log2FoldChange
        <dbl>    <dbl>          <dbl>
    1       1    1268.          0.100
    2       2    1414.         -0.266
    3       3    3376.         -0.244
    
    

    And finally left_join them:

    # left joining both datasets
    res_anno = res %>% 
      left_join(.,
                anno,
                by = 'gene_id')
    
    # A tibble: 3 × 4
      gene_id baseMean log2FoldChange SYMBOL
        <dbl>    <dbl>          <dbl> <chr> 
    1       1    1268.          0.100 Mrpl15
    2       2    1414.         -0.266 Lypla1
    3       3    3376.         -0.244 Tcea1 
    
    

    As per your comment, if you don't want to add a column to your original dataframe, you can just pipe the additional column and left_join so that it only exists in your new dataframe:

    res_anno = res %>% 
      rownames_to_column('gene_id') %>% 
      mutate(gene_id = gene_id %>% as.numeric()) %>% 
      left_join(.,
                anno,
                by = 'gene_id')
    
    
    # A tibble: 3 × 4
      gene_id baseMean log2FoldChange SYMBOL
        <dbl>    <dbl>          <dbl> <chr> 
    1       1    1268.          0.100 Mrpl15
    2       2    1414.         -0.266 Lypla1
    3       3    3376.         -0.244 Tcea1