Search code examples
rgenetics

select snps based on non-significant HWE using tableHWE in SNPassoc package in R


I am running a GWAS analysis using the SNPassoc package. I want to conduct this analysis in two steps: 1. Test all snps to determine if they are in hardy-wineburg equilibrium(HWE); 2. Run a test of association on the snps that are found to be in HWE(correcting for false discovery using a Bonfirroni correction).

To conduct these analyses, I am attempting to use the SNPassoc package in R. Using this package, one first transforms the data using the function setupSNP:

myData<-setupSNP(data=SNPs,colSNPs=1:10,sep="") In this example, lets say column 1 is some categorical dependent variable and 2-10 are snps with three levels (AA,AB,BB)

Once the data is in this format, you can test for HWE across all snps using the following function:

res<-tableHWE(myData)

Producing the following double where a flag <- points to p values that are <=.05

        HWE (p value) flag
rs001        0.6211                      
rs002        0.8866                        
rs003        0.8210            
rs004        0.8474            
rs005        0.8364            
rs006        0.4969            
rs007        0.7229            
rs008        0.0345        <- 

I can also conduct an automated set of comparisons doing the following:

ans<-WGassociation(myData$DV~1,data=myData)

producing the following result:

          comments codominant dominant recessive overdominant log-additive
rs001          -    0.63396  0.60907   0.56673      0.34511      0.98948
rs002        -    0.43964  0.30132   0.65763      0.20148      0.53629
rs003          -    0.76300  0.79052   0.46217      0.90503      0.60872
rs004        -    0.54956  0.35198   0.39842      0.64349      0.27900
rs005         -    0.37222  0.32537   0.53702      0.15989      0.71894
rs006         -    0.32753  0.18286   0.84399      0.13830      0.34471
rs007            -    0.82284  0.75360   0.68389      0.56956      0.95867
rs008         -    0.77077  0.48628   0.73038      0.53702      0.47055

What I want to do is select only those snps that are in HWE (i.e. p >.05) and then conduct the association tests on only those. Does anyone have an idea about how to do this? I can't find it in the documentation(see bellow)

see: (http://davinci.crg.es/estivill_lab/tools/SNPassoc/SupplementaryMaterial.pdf) for documentation.


Solution

  • Sounds to me like you want to subset your data.frame down to the rows that contain SNPs with a HWE P > 0.05, and then run the association test on the remaining values. With base R:

    # example object containing all snp data
    all_snps <- data.frame(SNP_ID=paste("POS_", 1:1000, sep=""), stat1=sample(100, size=1000, replace=TRUE))
    
    # example object containing hwe test
    res <- data.frame(SNP_ID=paste("POS_", 1:1000, sep=""), HWE_P=runif(1000))
    
    # subset the rows where P is > 0.05, extact vector of SNP IDs
    SNPS_in_HWE <- subset(res, HWE_P > 0.05)$SNP_ID
    
    
    # you can subset your "all snp data" object to the rows where the SNP ID is in the list of SNPs with P > 0.05
    snps_in_HWE_subset <- all_snps[all_snps$SNP_ID %in% SNPS_in_HWE,]
    
    # run the WGassociation function with this subset 
    ans<-WGassociation(snps_in_HWE_subset$DV~1,data=snps_in_HWE_subset)