Search code examples
ranova

How to extract P-values and append them to an already existing dataframe?


I am trying to write a simple code where it runs one-way ANOVA for each column, where the data looks like this:

PROTEIN A PROTEIN B
A1 Cell 1 Cell 2
A2 Cell 3 Cell 4
B1 Cell 5 Cell 6
B2 Cell 7 Cell 8

Data:

structure(list(acc = c("A", "A", "B", "B", "B", "C", "C", "C", 
"D", "D"), A0A8L2QEN5 = c(130.3, 110.4, 123.3, 143.2, 110.4, 
130.5, 109.1, 106.4, 19.5, 16.9), P63018 = c(97.1, 93.4, 103.1, 
102.2, 110.9, 113, 122.7, 135.1, 60.6, 61.9), P85108 = c(99.1, 
103.5, 97.9, 89.8, 87.8, 94.9, 87.8, 96.9, 121.5, 120.7), A0A8L2R7U3 = c(95.9, 
101.1, 97.5, 96.6, 87.4, 97.9, 82.3, 103.7, 119.5, 118.1)), class = "data.frame", row.names = c(NA, 
-10L))

The dataframe has 10 rows, and 300 columns. I have run ANOVA comparing Group A, B, C, etc. Unfortunately, I have to call for the summary of each ANOVA e.g. summary(anovas$PROTEIN A), which means that I have to do this manually 300 times. Is there any way to simply create a column (can be another dataframe) where P-value for the ANOVA is extracted automatically, so that I don't have to do this manually? Here is my code for the ANOVA:

    fit_aov <- function(col) {
      aov(col ~ trt, data = df_long)
    }

    anovas <- map(df_long[, 2:ncol(df_long)], fit_aov)

summary(anovas$protein2)[[1]][1,5] yields 1 readout.


Solution

  • A few things here:

    • you have to be a little bit careful sticking things into formulas. Probably the most reliable approach is to use reformulate with the column name as the response.
    • extracting the p-value information from summary() is possible, but using broom::tidy() will make your life a little easier.
    fit_aov <- function(col, trt = "acc") {
       f <- reformulate(trt, response = col)
       a <- aov(f, data = df_long)
       ## summary(a)[[1]][["Pr(>F)"]][1]
       broom::tidy(a)[["p.value"]][1]
    }
    vars <- names(df_long)[-1]
    fit_aov(vars[1])  ## test on the first response before 
    purrr::map_dbl(vars, fit_aov)