Search code examples
rbioinformaticsbioconductorgeneticsgenome

Map SNP IDs to genome coordinates


I have several SNP IDs (i.e., rs16828074, rs17232800, etc...), I want to their coordinates in a Hg19 genome from UCSC genome website.

I would prefer using R to accomplish this goal. How to do that?


Solution

  • Here is a solution using the Bioconductor package biomaRt. It is a slightly corrected and reformatted version of the previously posted code.

    library(biomaRt) # biomaRt_2.30.0
    
    snp_mart = useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
    
    snp_ids = c("rs16828074", "rs17232800")
    snp_attributes = c("refsnp_id", "chr_name", "chrom_start")
    
    snp_locations = getBM(attributes=snp_attributes, filters="snp_filter", 
                          values=snp_ids, mart=snp_mart)
    
    snp_locations
    #    refsnp_id chr_name chrom_start
    # 1 rs16828074        2   232318754
    # 2 rs17232800       18    66292259
    

    Users are encouraged to read the comprehensive biomaRt vignette and experiment with the following biomaRt functions:

    listFilters(snp_mart)
    listAttributes(snp_mart)
    attributePages(snp_mart)
    listDatasets(snp_mart)
    listMarts()