I have count data (columns) in the form of presence/absence (1/0) of various genes in different samples that belong to one of two categories. I am doing a Fisher's (fisher.test) for each gene, but I get an error whenever that gene is present (1) or absent (0) from all samples. How can I remove or skip these columns, or have the command fisher.test ignore or skip these genes and keep going?
Here is my sample data:
mydata <- data.frame(sampleID = c("A", "B", "C", "D", "E", "F", "G"),
category = c("high", "low", "high", "high", "low", "high", "low"),
Gene1 = c(1, 1, 0, 0, 0, 1, 1),
Gene2 = c(0, 1, 1, 1, 1, 1, 0),
Gene3 = c(0, 0, 0, 1, 1, 1, 1),
Gene4 = c(1, 1, 1, 1, 1, 1, 1)
Here is the loop code that someone helped me design, which applies the fisher.test to each gene:
library(dplyr)
library(tidyr)
library(broom)
mydata %>%
select(-sampleID) %>%
pivot_longer(cols = -category, names_to = "gene") %>%
group_by(gene) %>%
summarise(fisher_test = list(tidy(fisher.test(table(category, value))))) %>%
unnest(fisher_test) %>%
mutate(odds_ratio = exp(estimate)) %>%
select(-method, -alternative)
The error message I get when it encounters a gene that is present or absent from all samples:
Caused by error in `fisher.test()`:
! 'x' must have at least 2 rows and columns
Run `rlang::last_error()` to see where the error occurred.
Where can I insert this step into the loop above?
Note: It is not feasible to omit the genes manually, as there are hundreds of them.
We could add select
at the top to remove any numeric columns having a single unique observation (n_distinct(.x) == 1
)
library(dplyr)
library(tidyr)
mydata %>%
select(!where(~ is.numeric(.x) && n_distinct(.x) == 1),-sampleID) %>%
pivot_longer(cols = -category, names_to = "gene") %>%
group_by(gene) %>%
summarise(fisher_test = list(tidy(fisher.test(table(category, value))))) %>%
unnest(fisher_test) %>%
mutate(odds_ratio = exp(estimate)) %>%
select(-method, -alternative)
-output
# A tibble: 3 × 6
gene estimate p.value conf.low conf.high odds_ratio
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Gene1 1.81 1 0.0469 176. 6.11
2 Gene2 0.707 1 0.00640 78.2 2.03
3 Gene3 1.81 1 0.0469 176. 6.11