Search code examples
rfrequency

Calculating the minor allele frequency from a gds file?


I have a gds file describing SNP variants for a number of individuals in relation to a reference genome. I have used the hwe() function from the SeqVarTools package in R. This gave me the reference allele frequencies for each variant. I want to obtain the minor allele frequency and I do not know how to approach this issue as many packages require the data to be transformed to an obscure matrix classification that is not useful for further analyses.

My main question: How can I obtain the minor allele frequencies given my reference allele frequencies?

Below is a small example to help visualize my issue.

# Allele frequencies
af <- c(0.082, 0.765, 0.125, 0.986)

# Desired outcome
maf <- c(0.082, 0.235, 0.125, 0.014)

# List for outcome
maf <- c()

# Loop to take 1-af 
for (i in 1:length(af)) {
  if (af[i] > 0.501) {
    maf[i] <- 1-af[i]
  } else {maf[i] <- af[i] }
} 

A solution I was developing is a for loop to subtract (1 -af) if (i > 0.5) else {pass}.

My datasets are very large with over 30,000 variables so for loops are not ideal.


Solution

  • Here you go:

    # Allele frequencies
    af <- c(0.082, 0.765, 0.125, 0.986)
    
    # outcome
    maf <- ifelse(af > 0.5, 1 - af, af)