So I'm trying to perform multiple t-tests (or ANOVA if that works too) against multiple interventions (compared to a control) at multiple concentrations.
Below is a mock-up of my data that I've gotten into long-form (but I have multiple rows for each sample and concentration). Ideally, it would run multiple t-tests comparing each test sample at each concentration to its respective control concentration. i.e. Sample A-2-Sample B-2 = 0.001, Sample A-4-SampleB-4 = 0.005, Sample A-16-Sample B-16, 0.01, Sample A-2-Sample C-2, = 0.967, etc. Comparisons between two test groups (e.g. Sample B-Sample C or Sample B-2-Sample B-4) would be irrelevant.
I have many data sets to do this to so I don't want to have to split them up manually. I've seen lots of examples creating t-tests with 2 variables but not 3. Is there a better way to handle this? Should I just do a three-way ANOVA and ignore what I don't need?
Name | Control(y/n) | Concentration | Output |
---|---|---|---|
Sample A | Control | 2 | 0.123 |
Sample A | Control | 4 | 0.567 |
Sample A | Control | 16 | 1.075 |
Sample B | Test | 2 | 0.956 |
Sample B | Test | 4 | 5.435 |
Sample B | Test | 16 | 20.157 |
Sample C | Test | 2 | 0.354 |
Sample C | Test | 4 | 2.156 |
Sample C | Test | 16 | 2.569 |
Sample D | Test | 2 | 0.001 |
Sample D | Test | 4 | 0.231 |
Sample D | Test | 16 | 0.451 |
Answering my own question here. I actually figured out that, for my experiments, there wasn't equal variance between the different concentration groups. I ended up splitting them into individual dataframes, then running ANOVA on each separately, getting a list of p-values with Tukey's then recombining the list. Inevitably, there's a multiple comparisons problem I run into with that but it wouldn't have worked individually and I can always apply a separate post-hoc correction after. Code looked a little like the below. I could have used for loops for after the ANOVA steps but I didn't have many concentrations so I just didn't bother with it. It performs ANOVA by all permutations respective to each concentration, but I by selecting [1:4,] during the transform step, it only shows the p-values of the permutations containing Sample A.
##Using melt tutorial from here: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format
##Using ANOVA tutorial from here: http://www.sthda.com/english/wiki/two-way-anova-test-in-r
setwd("c:/R/STACKOVERFLOW")
library(data.table)
library(ggpubr)
sample <- read.csv("DATA.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')
View(sample)
longo <- melt(setDT(sample), id.vars = c("Concentration"), variable.name = "Name")
View(longo)
ggboxplot(longo, x = "Concentration", y = "Output", color = "Name")
ggline(longo, x = "Concentration", y = "Output", color = "Name",
add = c("mean_se", "dotplot"))
##Split these out so ANOVA can be performed on individual concentrations
ConSplit <- split(longo, f = longo$Concentration, drop = TRUE)
list2env(ConSplit,envir=.GlobalEnv)
#16
D16 <- as.data.frame(`16`)
A16<-aov(Output ~ Name, data = D16)
summary(A16)
T16 <- TukeyHSD(A16)
##Whatever, let's just make all of em
#4
D4 <- as.data.frame(`4`)
A4<-aov(Output ~ Name, data = D4)
summary(A4)
T4 <- TukeyHSD(A4)
#2
D2 <- as.data.frame(`2`)
A2<-aov(Output ~ Name, data = D2)
summary(A2)
T2 <- TukeyHSD(A2)
### LET'S OUTPUT THIS IN A USEABLE MANNER
data16 <- as.data.frame(T16[1])
data4 <- as.data.frame(T4[1])
data2 <- as.data.frame(T2[1])
#make dataframes for merge
M16 <- data.frame(Concentration = c(16))
M4 <- data.frame(Concentration = c(4))
M2 <- data.frame(Concentration = c(2))
#merge em
B16 <- transform(merge(data16[1:4,],M16,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B4 <- transform(merge(data4[1:4,],M4,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B2 <- transform(merge(data2[1:4,],M2,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
#Stack em
AllTogetherNow <- rbind(B16,B4,B2)
#See em
View(AllTogetherNow)
#Write em
write.csv(AllTogetherNow, "COMBINEDPVALS.csv")