Search code examples
rstatisticsp-valuet-testrstatix

How to extract pvalue for each column in r?


I have a dataframe as follows:

df = structure(list(aa = c(1L, 5L, 8L, 10L, 1L, 10L, 8L, 6L, 7L, 4L, 
1L, 5L, 7L, 7L, 5L, 8L), bb = c(2L, 9L, 1L, 10L, 8L, 7L, 10L, 
8L, 1L, 7L, 2L, 10L, 3L, 5L, 2L, 10L), cc = c(1L, 5L, 9L, 4L, 
9L, 1L, 8L, 3L, 2L, 2L, 2L, 5L, 7L, 2L, 2L, 3L), dd = c(10L, 
5L, 8L, 10L, 6L, 8L, 7L, 5L, 2L, 9L, 10L, 6L, 5L, 3L, 7L, 8L), 
    ee = c(5L, 7L, 5L, 1L, 8L, 4L, 5L, 2L, 10L, 6L, 8L, 10L, 
    6L, 5L, 10L, 6L), Group = c("High", "High", "High", "High", 
    "High", "High", "High", "High", "Low", "Low", "Low", "Low", 
    "Low", "Low", "Low", "Low")), class = "data.frame", row.names = c(NA, 
-16L))

I want to calculate pvalue for each column based on the Group mentioned in the table. my expected output is:

values  pvalue  t        mean in High     mean in Low 
aa      0.08    0.41523  6.8              5
bb      0.89    1.41523  6.8              4
cc      0.088   2.41523  2.3              8
dd      0.89    3.41523  9.6              2
ee      0.76    4.41523  4.3              5

I tried following code to generate the pvalue:

# Compute t-test
res <- t.test(aa ~ Group, data = df)
res

It results as:

    Welch Two Sample t-test

data:  aa by Group
t = 0.41523, df = 11.794, p-value = 0.6854
alternative hypothesis: true difference in means between group High and group Low is not equal to 0
95 percent confidence interval:
 -2.660919  3.910919
sample estimates:
mean in group High  mean in group Low 
             6.125              5.500 

Solution

  • want <- c('p.value','estimate', 'statistic')
    t(sapply(head(names(df),-1),\(x)unlist(t.test(reformulate('Group', x), df)[want])))
    
         p.value estimate.mean in group High estimate.mean in group Low statistic.t
    aa 0.6854296                       6.125                      5.500   0.4152274
    bb 0.3093107                       6.875                      5.000   1.0550233
    cc 0.1938533                       5.000                      3.125   1.3833764
    dd 0.3738283                       7.375                      6.250   0.9219951
    ee 0.0177543                       4.625                      7.625  -2.6880860
    
    
    pivot_longer(df,-Group) %>%
       group_by(name)%>%
       summarise(mod = list(unlist(t.test(value~Group)[want])))%>%
       unnest_wider(mod)
    
    # A tibble: 5 × 5
      name  p.value `estimate.mean in group High` `estimate.mean in group Low` statistic.t
      <chr>   <dbl>                         <dbl>                        <dbl>       <dbl>
    1 aa     0.685                           6.12                         5.5        0.415
    2 bb     0.309                           6.88                         5          1.06 
    3 cc     0.194                           5                            3.12       1.38 
    4 dd     0.374                           7.38                         6.25       0.922
    5 ee     0.0178                          4.62                         7.62      -2.69