I am interested in performing ANOVA in R. What I did was as indicated below link provided by zx8754; https://www.biostars.org/p/383058/
Actually, this worked well with my dataset. Then, I wanted to extract only p-values, however, I could not. What I was able to do was to extract a single p-value one by one.
Although there are so many suggestions posted on everywhere including stack overflow, none of them worked in my case.
I have a large dataset. Even though anova analysis works, I can not visualize the full result. I only see the result of the last several hundreds results. The first to the middle results were missing (did not show) due to the large amount of data size. Thus, I wanted to extract only p-values and wanted to save them as csv file.
Here is the code I did for anova and I do get all the results including p-values.
lapply(split(df1, df1$Class), function(i){anova(lm(Value ~ Sample, data = i))})
Next, if I conduct the following code,I can get the p-value correspond to the first one.
unlist(lapply(split(df1, df1$GeneSymbol), function(i){anova(lm(Value ~ Label, data = i))})[[1]]$"Pr(>F)"[1])
If I changed [1] to [2], then I will get the p-values correspond to the second one.
unlist(lapply(split(df1, df1$GeneSymbol), function(i){anova(lm(Value ~ Label, data = i))})[[2]]$"Pr(>F)"[1])
What I would like to do is extract multiple p-values at a once or save them as csv file. What am I supposed to do to solve this problem? Thank you in advance!
You could apply anova
on each group and extract p-value from them
vals <- sapply(split(df, df$GeneSymbol), function(i)
anova(lm(Value ~ Label, data = i))$"Pr(>F)"[1])
vals
# A B C
#0.6419426 0.9446151 0.9146334
If you want to write it in csv, you could do
p_data <- data.frame(p_value = vals)
write.csv(p_data, "/path/of/the/file.csv", row.names = FALSE)
Similarly with dplyr
you could do
df %>%
group_split(GeneSymbol) %>%
purrr::map_dbl(~anova(lm(Value ~ Label, data = .))$"Pr(>F)"[1])
#[1] 0.6419426 0.9446151 0.9146334
data
df <- structure(list(GeneSymbol = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
Value = c(0.14, 0.16, 0.01, 0.18, 0.54, 0.18, 0.2, 0.54,
0.2, 0.02, 0.2, 0.02), Label = c(1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13"))