I have a dataframe named 'res', where the row names are numbers corresponding to genes.
>res
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
27395 1268.40 0.100013 0.164840 0.606731 5.44029e-01 0.737925231
18777 1413.56 -0.266365 0.175847 -1.514758 1.29834e-01 0.312449929
21399 3376.09 -0.243707 0.132616 -1.837687 6.61086e-02 0.196027163
I am wondering how to give the row names of my dataframe the heading 'gene_id' so that my data frame ends up looking like this.
>res
gene_id baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
27395 1268.40 0.100013 0.164840 0.606731 5.44029e-01 0.737925231
18777 1413.56 -0.266365 0.175847 -1.514758 1.29834e-01 0.312449929
21399 3376.09 -0.243707 0.132616 -1.837687 6.61086e-02 0.196027163
I am planning to bind this dataframe with another dataframe (anno) containing information of the actual genes, by the 'gene_id' column using the left_join function.
>anno
gene_id SYMBOL GENENAME
1 27395 Mrpl15 mitochondrial ribosomal protein L15
2 18777 Lypla1 lysophospholipase 1
3 21399 Tcea1 transcription elongation factor A (SII) 1
res_anno <- left_join(res, anno,by="gene_id")
Is this what you're looking for?
Creating two dataframes that represent the example:
library(tidyverse)
# creating the res dataframe
res = tibble(
baseMean = c(1268.40,1413.56,3376.09),
log2FoldChange = c(0.100013,-0.266365,-0.243707)
)
# A tibble: 3 × 2
baseMean log2FoldChange
<dbl> <dbl>
1 1268. 0.100
2 1414. -0.266
3 3376. -0.244
# creating the anno dataframe
anno = tibble(
gene_id = c(1,2,3),
SYMBOL = c('Mrpl15', 'Lypla1', 'Tcea1')
)
# A tibble: 3 × 2
gene_id SYMBOL
<dbl> <chr>
1 1 Mrpl15
2 2 Lypla1
3 3 Tcea1
Then you can apply this to your dataset:
# extracting the rownames and putting them in a column
res = res %>%
rownames_to_column('gene_id') %>%
mutate(gene_id = gene_id %>% as.numeric())
# A tibble: 3 × 3
gene_id baseMean log2FoldChange
<dbl> <dbl> <dbl>
1 1 1268. 0.100
2 2 1414. -0.266
3 3 3376. -0.244
And finally left_join
them:
# left joining both datasets
res_anno = res %>%
left_join(.,
anno,
by = 'gene_id')
# A tibble: 3 × 4
gene_id baseMean log2FoldChange SYMBOL
<dbl> <dbl> <dbl> <chr>
1 1 1268. 0.100 Mrpl15
2 2 1414. -0.266 Lypla1
3 3 3376. -0.244 Tcea1
As per your comment, if you don't want to add a column to your original dataframe, you can just pipe the additional column and left_join
so that it only exists in your new dataframe:
res_anno = res %>%
rownames_to_column('gene_id') %>%
mutate(gene_id = gene_id %>% as.numeric()) %>%
left_join(.,
anno,
by = 'gene_id')
# A tibble: 3 × 4
gene_id baseMean log2FoldChange SYMBOL
<dbl> <dbl> <dbl> <chr>
1 1 1268. 0.100 Mrpl15
2 2 1414. -0.266 Lypla1
3 3 3376. -0.244 Tcea1