I have an expression set matrix with the rownames being what I think is a GENCODE ID in the format for example "ENSG00000000003.14" "ENSG00000000457.13" "ENSG00000000005.5" and so on. I would like to convert these to gene_symbol but I am not sure of the best way to do so, especially because of the ".14" or ".13" which I believe is the version. Should I first trim all IDs for what is after the dot and then use biomaRt to convert? if so, what is the most efficient way of doing it? Is there a better way to get to the gene_symbol?
Many thanks for you help
Thanks for the help. My problem was to get rid of the version .XX at the end of each ensembl gene id. I thought there would be a more straight forward way of going from an ensembl gene id that has the version number (gencode basic annotation) to a gene symbol. In the end I did the following and seem to be working:
df$ensembl_gene_id <- gsub('\\..+$', '', df$ensembl_gene_id)
library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- df$ensembl_gene_id
symbol <- getBM(filters = "ensembl_gene_id",
attributes = c("ensembl_gene_id","hgnc_symbol"),
values = genes,
mart = mart)
df <- merge(x = symbol,
y = df,
by.x="ensembl_gene_id",
by.y="ensembl_gene_id")