Search code examples
rsubsett-test

How to run a paired t-test on different levels of a categorical variable?


I am trying to run a paired t-test on pre- and post-intervention results of three intervention types. I am trying to run the the test on each intervention separately using "subset" in t.test function but it keeps running the test on the whole sample. I cannot separate the intervention levels manually as this is a large database and I do not have access to the excel file. Does anyone have any suggestions?

Here's the codes I am using:

Treatment (intervention) levels:"Passive" "Pro" "Peer"

"Post" and "Pre" are continuous variables.

t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Peer")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Pro")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Passive")

Solution

  • There is no subset argument (nor a data argument) for the t.test function when using the default method:

    > args(stats:::t.test.default)
    function (x, y = NULL, alternative = c("two.sided", "less", 
        "greater"), mu = 0, paired = FALSE, var.equal = FALSE, 
        conf.level = 0.95, ...)
    

    You'll have to subset first,

    with(subset(data, subset=Treatment=="Peer"),
             t.test(Post, Pre, paired=TRUE)
        )
    

    There's also an easier way using dplyr and broom...

    library(dplyr)
    library(broom)
    
    data %>%
      group_by(Treatment) %>%
      do(tidy(t.test(.$Pre, .$Post, paired=TRUE)))
    

    Reproducible example:

    set.seed(123)
    data <- tibble(id=1:63, Pre=rnorm(21*3,10,5), Post=rnorm(21*3,13,5), 
                       Treatment=sample(c("Peer","Pro","Passive"), 63, TRUE))
    data
    # A tibble: 63 x 4
          id   Pre  Post Treatment
       <int> <dbl> <dbl> <chr>    
     1     1  7.20  7.91 Pro      
     2     2  8.85  7.64 Peer     
     3     3 17.8  14.5  Peer     
     4     4 10.4  15.2  Peer     
     5     5 10.6  13.3  Passive  
     6     6 18.6  17.6  Passive  
     7     7 12.3  23.3  Pro      
     8     8  3.67 10.5  Peer     
     9     9  6.57  1.45 Pro      
    10    10  7.77 18.0  Passive  
    # ... with 53 more rows
    

    Output:

    # A tibble: 3 x 9
    # Groups:   Treatment [3]
      Treatment estimate statistic p.value parameter conf.low conf.high method     alternative
      <chr>        <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>      <chr>      
    1 Passive      -2.41    -1.72  0.107          14    -5.42     0.592 Paired t-~ two.sided  
    2 Peer         -3.61    -2.96  0.00636        27    -6.11    -1.10  Paired t-~ two.sided  
    3 Pro          -1.22    -0.907 0.376          19    -4.03     1.59  Paired t-~ two.sided