Search code examples

Minor alleles giving NA from R biomart getBM?

Does anyone know how to extract minor allele data from grch38 ? the following gives NAs: library("biomaRt"); snp.db3 <- useMart(host = "", biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp"); nt.biomart3 <- getBM(attributes = c("refsnp_id", "minor_allele", "minor_allele_freq", "chrom_start", "chrom_strand", "associated_gene"), filters = c("snp_filter"), values = c("rs3762444", "rs284262", "rs655598", "rs12089815", "rs12140153", "rs788163", "rs1064213", "rs1106090", "rs7557796", "rs16825008"), mart = snp.db3, uniqueRows = TRUE); nt.biomart3 The data is apparently available on Ensembl (e.g.;r=10:120308754-120309754;v=rs10788066;vdb=variation;vf=654619804) but the data extraction method only works if I use host = "" - otherwise it gives NAs for minor allele and minor allele frequency - is there something else I should do ? Or is there a better method ? Or is the minor allele data from grch37 actually up-to-date ?


  • Online Biomart (Martview) from Ensembl ( doesn't show either this information :


    That explains why you're getting "NA" with "biomaRt".

    Online Biomart (Martview) from Ensembl GRCh37 ( displays it : grch37

    Grch37 seems up to date (Release 112 (May 2024)).

    To produce the preceding screenshots :

    • Go to or
    • Choose database -> "Ensembl Variation 112"
    • Choose dataset -> "Human Short Variants (SNPs and indels excluding flagged variants) (GRCh38.p14)"
    • Filters -> "General Variant Filters" -> Tick "Filter by Variant name" and specify the SNPs in the textbox
    • Attributes -> Variant -> "Variant Associated Information" -> Tick "Minor allele (ALL)" & "Global minor allele frequency (all individuals)"
    • Push on the buton "Results" at the top of the page

    If you want data from Grch38, you can try to scrape the data directly from "".

    ### Packages
    ### SNPs to look for
    look=c("rs3762444", "rs284262", "rs655598", "rs12089815", "rs12140153", "rs788163", "rs1064213", "rs1106090", "rs7557796", "rs16825008")
    ### Function to get the data
      a= z %>% 
        html_element(xpath = '//span[.="Highest population MAF"]/following-sibling::span/b') %>%
      if ( {b<-NA_character_}else{
        b = z %>% 
        html_element(xpath = '//span[.="Highest population MAF"]/following-sibling::span') %>%
        html_attr("title") %>%
        read_html() %>% html_text2()}
    ### Map operation
    out=map(look,biomart,.progress = TRUE)
    ### Build the dataframe

    Output :

             SNP  MAF                       source
    1   rs3762444 <NA>                         <NA>
    2    rs284262 0.49 T in 1000GENOMES:phase_3:SAS
    3    rs655598 0.49 A in 1000GENOMES:phase_3:FIN
    4  rs12089815 0.50 G in 1000GENOMES:phase_3:EUR
    5  rs12140153 0.18             T in gnomADg:ami
    6    rs788163 0.42 C in 1000GENOMES:phase_3:LWK
    7   rs1064213 0.50             A in gnomADe:nfe
    8   rs1106090 0.50             A in gnomADg:amr
    9   rs7557796 0.48 T in 1000GENOMES:phase_3:SAS
    10 rs16825008 0.49 A in 1000GENOMES:phase_3:ITU

    Notes :

    • operation takes approximately a minute to complete for these 10 SNPs ( servers seems overloaded) ;
    • rs3762444 is "NA" since there're 2 locations for this SNP. Some more lines of code would be needed to fix this issue.