This is an example dataset:
df = data.frame(genes = c("A", "B", "C", "D", "E"),
KO_0min_Rep1 = c(0, 1, 2, 6, 6),
KO_0min_Rep2 = c(0, 3, 2, 3, 6),
KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
KO_120min_Rep1 = c(0, 1, 1, 6, 5),
KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
WT_0min_Rep1 = c(0, 1, 2, 6, 6),
WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
WT_60min_Rep1 = c(0, 1, 2, 9, 6),
WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
WT_120min_Rep2 = c(0, 1.2, 2, 6, 2) )
The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.
I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.
This is what I have tried:
lme4_dat = df %>%
tidyr::gather(conditions, value, -genes) %>%
dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
grepl("WT", conditions) ~ "WT")) %>%
dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
grepl("60", conditions) ~ "60",
grepl("120", conditions) ~ "120" )) %>%
dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
grepl("_Rep2", conditions) ~ "Rep2"))
Then I try to fit a linear mixed-effects model
lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)
It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?
Any guidance will be much appreciated. Thank you.
I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.
lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)
Output
Error: genes
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 310.4 77.59
Error: genes:time
Df Sum Sq Mean Sq F value Pr(>F)
time 2 2.617 1.309 0.424 0.668
Residuals 8 24.678 3.085
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
group 1 2.48 2.4807 1.892 0.176
time:group 2 0.21 0.1047 0.080 0.923
Residuals 42 55.07 1.3112
The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60 the response increased before dropping at 120.