I am running a GWAS analysis using the SNPassoc package. I want to conduct this analysis in two steps: 1. Test all snps to determine if they are in hardy-wineburg equilibrium(HWE); 2. Run a test of association on the snps that are found to be in HWE(correcting for false discovery using a Bonfirroni correction).
To conduct these analyses, I am attempting to use the SNPassoc package in R. Using this package, one first transforms the data using the function setupSNP:
myData<-setupSNP(data=SNPs,colSNPs=1:10,sep="")
In this example, lets say column 1 is some categorical dependent variable and 2-10 are snps with three levels (AA,AB,BB)
Once the data is in this format, you can test for HWE across all snps using the following function:
res<-tableHWE(myData)
Producing the following double where a flag <- points to p values that are <=.05
HWE (p value) flag
rs001 0.6211
rs002 0.8866
rs003 0.8210
rs004 0.8474
rs005 0.8364
rs006 0.4969
rs007 0.7229
rs008 0.0345 <-
I can also conduct an automated set of comparisons doing the following:
ans<-WGassociation(myData$DV~1,data=myData)
producing the following result:
comments codominant dominant recessive overdominant log-additive
rs001 - 0.63396 0.60907 0.56673 0.34511 0.98948
rs002 - 0.43964 0.30132 0.65763 0.20148 0.53629
rs003 - 0.76300 0.79052 0.46217 0.90503 0.60872
rs004 - 0.54956 0.35198 0.39842 0.64349 0.27900
rs005 - 0.37222 0.32537 0.53702 0.15989 0.71894
rs006 - 0.32753 0.18286 0.84399 0.13830 0.34471
rs007 - 0.82284 0.75360 0.68389 0.56956 0.95867
rs008 - 0.77077 0.48628 0.73038 0.53702 0.47055
What I want to do is select only those snps that are in HWE (i.e. p >.05) and then conduct the association tests on only those. Does anyone have an idea about how to do this? I can't find it in the documentation(see bellow)
see: (http://davinci.crg.es/estivill_lab/tools/SNPassoc/SupplementaryMaterial.pdf) for documentation.
Sounds to me like you want to subset your data.frame down to the rows that contain SNPs with a HWE P > 0.05, and then run the association test on the remaining values. With base R:
# example object containing all snp data
all_snps <- data.frame(SNP_ID=paste("POS_", 1:1000, sep=""), stat1=sample(100, size=1000, replace=TRUE))
# example object containing hwe test
res <- data.frame(SNP_ID=paste("POS_", 1:1000, sep=""), HWE_P=runif(1000))
# subset the rows where P is > 0.05, extact vector of SNP IDs
SNPS_in_HWE <- subset(res, HWE_P > 0.05)$SNP_ID
# you can subset your "all snp data" object to the rows where the SNP ID is in the list of SNPs with P > 0.05
snps_in_HWE_subset <- all_snps[all_snps$SNP_ID %in% SNPS_in_HWE,]
# run the WGassociation function with this subset
ans<-WGassociation(snps_in_HWE_subset$DV~1,data=snps_in_HWE_subset)