Search code examples
rannotationsbiomart

Convert GENCODE IDs to Ensembl - Ranged SummarizedExperiment


I have an expression set matrix with the rownames being what I think is a GENCODE ID in the format for example "ENSG00000000003.14" "ENSG00000000457.13" "ENSG00000000005.5" and so on. I would like to convert these to gene_symbol but I am not sure of the best way to do so, especially because of the ".14" or ".13" which I believe is the version. Should I first trim all IDs for what is after the dot and then use biomaRt to convert? if so, what is the most efficient way of doing it? Is there a better way to get to the gene_symbol?

Many thanks for you help


Solution

  • Thanks for the help. My problem was to get rid of the version .XX at the end of each ensembl gene id. I thought there would be a more straight forward way of going from an ensembl gene id that has the version number (gencode basic annotation) to a gene symbol. In the end I did the following and seem to be working:

    df$ensembl_gene_id <- gsub('\\..+$', '', df$ensembl_gene_id)
    
    library(biomaRt)
    mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
    genes <- df$ensembl_gene_id
    symbol <- getBM(filters = "ensembl_gene_id",
                    attributes = c("ensembl_gene_id","hgnc_symbol"),
                    values = genes, 
                    mart = mart)
    df <- merge(x = symbol, 
                  y = df, 
                  by.x="ensembl_gene_id",
                  by.y="ensembl_gene_id")