Search code examples
rdataframecorrelation

How do I extract estimates and p-value from cor.test results across multiple groups?


I have a set of data that has multiple groups across multiple time points (sample cut below):

enter image description here

I've been trying to conduct multiple cor.test between X and Y between each status, and across each gender.

I was having a hard time figuring out the group comparison so I filtered by status and split my gender cor.tests separately for Status = Red and Status = Blue (using filter).

This is my current code which runs cor.test across each gender:

    red_status <- all %>% filter(status == "Red")
    cor_red <- by(red_status, red_status$gender, 
                  FUN = function(df) cor.test(df$X, df$Y, method = "spearman"))

The output result shows the 3 different cor.test across each gender:


red_status$gengrp: M
    Spearman's rank correlation rho

data:  df$X and df$Y
S = 123.45, p-value = 0.123
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.123456 

----------------------------------

red_status$gengrp: F
... (same output style as gengrp: M ^)

----------------------------------
red_status$gengrp: O
... (same output style as gengrp: M ^)

What I'm trying to do is extract the estimate and p-values across all gender cor.test and place them in a dataframe.

I thought I can use the data.frame() function to extract the gender name and correlation elements, then just add another column for p-values, however doing this gave me an error:

# Where red_status[1] is gender names (M,F,O) and red_status[[3:4]] are the Spearman p-value and rho estimate *within each gender category*
data.frame(group = dimnames(red_status)[1], est = as.vector(red_status)[[3]], 
pval = as.vector(red_status[[4]])
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"htest"’ to a data.frame

Since I filtered by Status == Red, I'd have to rerun the codes again to get the result for the gender cor.test results of Status == Blue then at the end bind the estimates and p-values all to 1 df.

My goal is to be able to create a data frame that shows the correlation estimate and p-value across each status and gender:

Status   Gender   Estimate(rho)    P-value
   Red        M            1.23      0.123
   Red        F            0.45      0.054
   Red        O             ...        ...
  Blue        M           0.004      0.123
  Blue        F             ...        ...
  Blue        O             ...        ...

Any help/tips would be appreciated.


Solution

  • Answer based off of @Ruam Pimentel's comment regarding the rstatix package.

    Using piping + the cor_test() function, I can group by both status and gender to run the correlation test. The cor_test() function is designed to output a dataframe containing all elements of the correlation (i.e. statistic, p-value, estimate) all depending on the correlation method of choice.

    This is the code that worked:

     r <- dfall %>%
                    group_by(status, gender) %>%
                    cor_test(X, Y, method = "spearman")
    

    Result (edited the numbers):

      status  gengrp var1   var2   cor  statistic    p   method  
      <chr>   <fct>  <chr>  <chr> <dbl>     <dbl>  <dbl> <chr>   
    1 Red       M     X       Y    0.98    -0.123  0.123  Spearman
    2 Red       F     X       Y    0.12     0.456  0.456  Spearman
    3 Red       O     X       Y    0.34     0.944  0.789  Spearman
    4 Blue      M     X       Y    0.56     0.246  0.101  Spearman
    5 Blue      F     X       Y    0.78    0.4107  0.111  Spearman
    6 Blue      O     X       Y     0.9     0.123  0.122  Spearman