Search code examples
ranovap-value

How to extract p-values at a time and save p-values as csv files?


I am interested in performing ANOVA in R. What I did was as indicated below link provided by zx8754; https://www.biostars.org/p/383058/

Actually, this worked well with my dataset. Then, I wanted to extract only p-values, however, I could not. What I was able to do was to extract a single p-value one by one.

Although there are so many suggestions posted on everywhere including stack overflow, none of them worked in my case.

I have a large dataset. Even though anova analysis works, I can not visualize the full result. I only see the result of the last several hundreds results. The first to the middle results were missing (did not show) due to the large amount of data size. Thus, I wanted to extract only p-values and wanted to save them as csv file.

Here is the code I did for anova and I do get all the results including p-values.

lapply(split(df1, df1$Class), function(i){anova(lm(Value ~ Sample, data = i))})

Next, if I conduct the following code,I can get the p-value correspond to the first one.

unlist(lapply(split(df1, df1$GeneSymbol), function(i){anova(lm(Value ~ Label, data = i))})[[1]]$"Pr(>F)"[1])

If I changed [1] to [2], then I will get the p-values correspond to the second one.

unlist(lapply(split(df1, df1$GeneSymbol), function(i){anova(lm(Value ~ Label, data = i))})[[2]]$"Pr(>F)"[1])

What I would like to do is extract multiple p-values at a once or save them as csv file. What am I supposed to do to solve this problem? Thank you in advance!


Solution

  • You could apply anova on each group and extract p-value from them

    vals <- sapply(split(df, df$GeneSymbol), function(i) 
                  anova(lm(Value ~ Label, data = i))$"Pr(>F)"[1])
    vals
    
    #       A         B         C 
    #0.6419426 0.9446151 0.9146334 
    

    If you want to write it in csv, you could do

    p_data <- data.frame(p_value = vals)
    write.csv(p_data, "/path/of/the/file.csv", row.names = FALSE)
    

    Similarly with dplyr you could do

    df %>%
      group_split(GeneSymbol) %>%
      purrr::map_dbl(~anova(lm(Value ~ Label, data = .))$"Pr(>F)"[1])
    
    #[1] 0.6419426 0.9446151 0.9146334
    

    data

    df <- structure(list(GeneSymbol = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
    1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    Value = c(0.14, 0.16, 0.01, 0.18, 0.54, 0.18, 0.2, 0.54, 
    0.2, 0.02, 0.2, 0.02), Label = c(1L, 1L, 1L, 1L, 1L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("2", 
    "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13"))