Search code examples
rtidyverserstatix

fisher exact test for 2 consecutive rows in data frame R


i have data frame where for a 1 site i have tumor and normal count data. I want to do fisher exact test using the count_unmethylated and count_methylated for tumor and normal for each position chromosome start end.

so for the first position;

chromosome start   end
1          10469   10469

i want to conduct fisher extact test this way

              count_unmethylated  count_methylated
  norm         0      2
  tum          1      3

and do it for the rest of loci chromosome start end

i tried solution from previous code with modification but didn't work: Row-wise Fisher Exact Test, grouped by samples in R

head(tumNorm_dt_merged_long) %>%
  group_by(chromosome,    start,      end) %>% 
  summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), 
                        starts_with('count_'))), p.adjust.method = "BH"), ncol=2)) %>%
  unnest_wider(data) %>%
  unnest(c(group:p.adj.signif)) -> Fisher_result

my data looks like this

 dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c("1", "1", "1", "1", "1", "1"), 
    start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    group = c("norm", "tum", "norm", "tum", "norm", "tum"), count_methylated = c(2, 
    3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2), 
    methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x130baa0>, sorted = c("chromosome", 
"start", "end", "group"))

Solution

  • Here is a solution using base R. Split the data frame based on the start column, assumes just 2 rows per unique start value. The use the lapply loop to calculate the Fisher's test on columns 5 & 6.

    tumNorm_dt_merged_long <- structure(list(chromosome = c("1", "1", "1", "1", "1", "1"), 
                   start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
                   end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
                   group = c("norm", "tum", "norm", "tum", "norm", "tum"), 
                   count_methylated = c(2, 3, 3, 2, 1, 2), 
                   count_unmethylated = c(0, 1, 0, 0, 1, 2), 
                   methylation_percentage = c(100, 75, 100, 100, 50, 50)), 
              row.names = c(NA, -6L), class = c("data.table", "data.frame"), sorted = c("chromosome", "start", "end", "group"))
    
    dflist <- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)
    
    output <-lapply(dflist, function(x){
       print(x)
       results <- fisher.test(x[ , c(5,6)])
       print(results)
       results
    })