Search code examples
statisticsgenetics

What is the simplest way to complete a function on every row of a large table?


so I want to do a fisher exact test (one sided) on every row of a 3000+ row table with a format matching the below example

gene sample_alt sample_ref population_alt population_ref
One 4 556 770 37000
Two 5 555 771 36999
Three 6 554 772 36998

I would ideally like to make another column of the table equivalent to

[(4+556)!(4+770)!(770+37000)!(556+37000)!]/[4!(556!)770!(37000!)(4+556+770+37000)!]

for the first row of data, and so on and so forth for each row of the table.

I know how to do a fisher test in R for simple 2x2 tables, but I wouldn't know how I would apply the fisher.test() function to each row of a large table. I also can't use an excel formula because the numbers get so big with the factorials that they reach excel's digit limit and result in a #NUM error. What's the best way to simply complete this? Thanks in advance!


Solution

  • Beginning with a tab-delimited text file on desktop (table.txt) with the same format as shown in the stem question

    if(!require(psych)){install.packages("psych")}
    
    multiFisher = function(file="Desktop/table.txt", saveit=TRUE, 
                           outfile="Desktop/table.csv", progress=T,
                           verbose=FALSE, digits=3, ... )
      
    {
    
    require(psych)
    
    Data = read.table(file, skip=1, header=F,
                      col.names=c("Gene", "MD", "WTD", "MC", "WTC"), ...)
    
    if(verbose){print(str(Data))}
    
    Data$Fisher.p   = NA
    Data$phi        = NA
    Data$OR1        = format(0.123, nsmall=3)
    Data$OR2        = NA
    
    if(progress){cat("\n")}
    
    for(i in 1:length(Data$Gene)){
      
      Matrix = matrix(c(Data$WTC[i],Data$MC[i],Data$WTD[i],Data$MD[i]), nrow=2)
      
      Fisher = fisher.test(Matrix, alternative = 'greater')
    
      Data$Fisher.p[i] = signif(Fisher$p.value, digits=digits) 
    
      Data$phi[i] = phi(Matrix, digits=digits)
      
      OR1 = (Data$WTC[i]*Data$MD[i])/(Data$MC[i]*Data$WTD[i])
      OR2 = 1 / OR1
      
      Data$OR1[i] = format(signif(OR1, digits=digits), nsmall=3)
      
      Data$OR2[i] = signif(OR2, digits=digits)
      
      if(progress) {cat(".")}
    
    }  
    
    if(progress){cat("\n"); cat("\n")}
    
    if(saveit){write.csv(Data, outfile)}
    
    return(Data)
    
    }
    
    multiFisher()