two sample t test in R: direction of comparison?

Let's say I have data of the lung capacity of smokers and non-smokers. So we have the variable "lungCap" with a numeric value, and the variable "smoking" with the values "yes" or "no". Now I want to see if the capacity of non-smokers is bigger than that of smokers:

t.test(lungCap~smoking, alt="greater")

Does the test now calculate if "yes" > "no" or "no" > "yes"? How is this determined? I could not find it in the help for the t.test command.

Solution

When using a character-based independent variable, t.test() will compare based on the alphabetical order of the values in the independent variable.

To illustrate, we'll compare miles per gallon in cars with manual vs. automatic transmissions using the 1973 Motor Trend cars data set.

We'll create a character variable to represent automatic vs. manual (to illustrate the scenario in the OP) and run a t test.

We'll test the following hypotheses:

H_null: mpg of manual transmission cars <= mpg of automatic transmission cars
H_alt: mpg of manual transmission cars is greater than mpg of automatic transmission cars.

To run the test, we'll load the data, create the extra column and execute t.test().

data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"manual","automatic")

t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

...and the output:

> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.9993
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -10.57662       Inf
sample estimates:
mean in group automatic    mean in group manual 
               17.14737                24.39231

What we see here is that t.test() runs automatic > manual, and hence the p-value is 0.9993.

To correctly run the test we'll modify it to use the alt="less" argument.

> t.test(mtcars$mpg ~ mtcars$trans,alt="less")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -3.913256
sample estimates:
mean in group automatic    mean in group manual 
               17.14737                24.39231 

>

Here we see the reported p-value as 0.0006, meaning that we reject the null hypothesis in favor of the alternate hypothesis that automatic transmission cars have lower average miles per gallon than manual transmission cars.

Changing the Order of Comparison

Responding to the questions in the comments about whether there is a way to change the grouping order, the t.test() function does not provide a way to do this. However, one can simply add 1. and 2. in front of the group names to force t.test() to use the group that includes 1. as the first group in the comparison.

Returning to our mtcars example, if we want manual transmissions to be the first group in the comparison so we get a positive t value for the alternate hypothesis h_alt: mpg(manual) > mpg(automatic) we could use the following code.

data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"1. manual","2. automatic")
t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

...and the output:

> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$trans
t = 3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means between group 1. manual and group 2. automatic is greater than 0
95 percent confidence interval:
 3.913256      Inf
sample estimates:
   mean in group 1. manual mean in group 2. automatic 
                  24.39231                   17.14737