Search code examples
rbiomart

convert Ensembl ID to gene name using biomaRt


I have a dataset called kidney_ensembl and I need to convert Ensembl IDs to gene names.

I'm trying the code below, but it's not working. Can somebody help me?

I know there are similar questions, but they are not helping me. Many thanks!

converting from Ensembl gene ID's to different identifier

How can I convert Ensembl ID to gene symbol in R?

library(tidyverse)
kidney <- data.frame(gene_id = c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12")
)
#kidney <- read_delim("Desktop/kidney_ensembl.txt", delim = "\t")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

library("biomaRt")

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <-  kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
              values = genes, mart= mart)

kidney_final <- left_join(kidney, gene_IDs, by = NULL)

Solution

  • The biomart part worked, it's your left join that fails because there are no common columns, gene_IDs has the ensembl id under "ensembl_gene_id" while your kidney dataframe has it under "gene_id".

    Also you need to check whether they are gencode or ensembl. Gencode ids normally have a .[number] for example, ENSG00000000003.10 , in ensembl database it is ENSG00000000003.

    library("biomaRt")
    library("dplyr")
    
    kidney <- data.frame(gene_id = 
    c("ENSG00000000003.10","ENSG00000000005.5",
    "ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12"),
    vals=runif(5)
    )
    #make this a character, otherwise it will throw errors with left_join
    kidney$gene_id <- as.character(kidney$gene_id)
    # in case it's gencode, this mostly works
    #if ensembl, will leave it alone
    kidney$gene_id <- sub("[.][0-9]*","",kidney$gene_id)
    
    mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
    genes <-  kidney$gene_id
    gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
                  values = genes, mart= mart)
    
    left_join(kidney, gene_IDs, by = c("gene_id"="ensembl_gene_id"))
    
              gene_id      vals hgnc_symbol
    1 ENSG00000000003 0.2298255      TSPAN6
    2 ENSG00000000005 0.4662570        TNMD
    3 ENSG00000000419 0.7279107        DPM1
    4 ENSG00000000457 0.3240166       SCYL3
    5 ENSG00000000460 0.3038986    C1orf112