Search code examples
rbioinformaticsbioconductorbiomart

Get hgnc_symbol/gene_name from ensembl_gene_id


I have this code (come from here):

library('biomaRt')
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- rownames(res)
G_list <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","entrezgene", "description","hgnc_symbol"),values=genes,mart= mart)

But when I check G_list: it is empty.

I understand why:

Here some examples of my ensembl_gene_id in genes:

"ENSG00000260727.1", "ENSG00000277521.1", "ENSG00000116514.16"

If I give this ID to getBM(), it returns nothing.

However if I delete the number after the point and the point like this:

"ENSG00000260727", "ENSG00000277521", "ENSG00000116514"

I get the expected results.

Is there a way to give gene_ID with points and get the expected results?


Solution

  • Not an answer but a bit too long for a comment; happy to remove if deemed not appropriate.

    In short, yes, you need to remove the "dot digit" part of the Ensembl gene name. The numbers denote different version numbers associated with stable Ensembl identifiers.

    From the Ensembl documentation on stable IDs:

    When reassigning stable identifiers between reannotation we can optionally choose to increment the version number assigned with a stable identifier. We do so to indicate an underlying change in the entity.

    For genes (i.e. Ensembl identifiers of the form ENSG*), the version number increments when the set of transcripts linked to a gene changes.

    This post is in fact a duplicate of a post on Biostars: Question: Mapping Ensembl Gene IDs with dot suffix; you should take a look at some of the R solutions discussed there.


    Postscript

    Instead of using Biomart it's often better/faster to use some of the existing annotation packages from Bioconductor. For example, take a look at