Search code examples
rdplyrt-test

Perform t-tests by groups


I am trying to do t test comparing control and treatment groups in a long table.

Part of table looks like this, the ones with T are the one with treatment while the ones without T are the controls and each group has triplicate:

Cell_line Gene Group Values
A a 1 1
A a 1 2
A a 1 3
A a 1_T 1
A a 1_T 2
A a 1_T 3
A a 2 1
A a 2 2
A a 2 3
A a 2_T 1
A a 2_T 2
A a 2_T 3
A a 3 1
A a 3 2
A a 3 3
A a 3_T 1
A a 3_T 2
A a 3_T 3

I want to compare the treatment with the respective control only, so it will be 1 vs 1_T, 2 vs 2_T, 3 vs 3_T and so on. My end goal is to generate a column of p-value from the t test comparing treatment and respective control.

I've tried the codes below and some other codes as well but all are not working. I am thinking if I should change the table format? Any suggestions or help would be much appreciated!

dataframe <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
 mutate(t.test(Values ~ Group))

dataframe_1 <- dataframe %>% group_by(Cell_line, Gene, Group) %>%
 select_if(is.numeric) %>%
 map_df(t.test(Values, Group, paired = T))

Solution

  • You should separate the Group column into 2 columns, one indicates ID and the other indicates treatment(T) or control(C) groups.

    library(dplyr)
    library(tidyr)
    
    df2 <- df %>%
      separate(Group, c("ID", "Group"), sep = "_", fill = "right") %>%
      mutate(Group = replace_na(Group, "C"))
    
    # > df2
    #    Cell_line Gene ID Group   Values
    # 1          A    a  1     C 19.00937
    # 2          A    a  1     C 19.24884
    # 3          A    a  1     C 17.69836
    # 4          A    a  1     T 25.38643
    # 5          A    a  1     T 23.04596
    # 6          A    a  1     T 24.25100
    # ...
    

    Then perform the two sample or paired t-test for each ID:

    df2 %>%
      group_by(Cell_line, Gene, ID) %>%
      group_map(~ t.test(Values ~ Group, .x, paired = TRUE))
    
    Output
    [[1]]
            Paired t-test
    
    data:  Values by Group
    t = -6.2599, df = 2, p-value = 0.02458
    alternative hypothesis: true mean difference is not equal to 0
    95 percent confidence interval:
     -9.407919 -1.743297
    sample estimates:
    mean difference
          -5.575608
    
    [[2]]
            Paired t-test
    
    data:  Values by Group
    t = -8.9412, df = 2, p-value = 0.01228
    alternative hypothesis: true mean difference is not equal to 0
    95 percent confidence interval:
     -8.261189 -2.893422
    sample estimates:
    mean difference
          -5.577306
    
    [[3]]
            Paired t-test
    
    data:  Values by Group
    t = -1.929, df = 2, p-value = 0.1935
    alternative hypothesis: true mean difference is not equal to 0
    95 percent confidence interval:
     -11.844963   4.511769
    sample estimates:
    mean difference
          -3.666597
    

    Update

    If you want to summarise each group with the p-value of each t-test, try summarise():

    df2 %>%
      group_by(Cell_line, Gene, ID) %>%
      summarise(p.value = t.test(Values ~ Group, paired = TRUE)$p.value) %>%
      ungroup()
    
    # # A tibble: 3 × 4
    #   Cell_line Gene  ID    p.value
    #   <chr>     <chr> <chr>   <dbl>
    # 1 A         a     1      0.0246
    # 2 A         a     2      0.0123
    # 3 A         a     3      0.194
    

    Data
    df <- structure(list(Cell_line = c("A", "A", "A", "A", "A", "A", "A",
    "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Gene = c("a",
    "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
    "a", "a", "a", "a"), Group = c("1", "1", "1", "1_T", "1_T", "1_T",
    "2", "2", "2", "2_T", "2_T", "2_T", "3", "3", "3", "3_T", "3_T",
    "3_T"), Values = c(19.0093682898042, 19.2488407161094, 17.6983554368874,
    25.3864281704297, 23.0459637706291, 24.2509958128999, 18.6843799736362,
    20.7674389968636, 18.833524600653, 23.2825845151011, 26.1647404821767,
    25.5699355732609, 20.820013126065, 20.2674129364223, 21.3344018769664,
    22.4175652694876, 22.2066293870532, 28.7974230636024)), row.names = c(NA, 
    -18L), class = "data.frame")