Let's say I have data of the lung capacity of smokers and non-smokers. So we have the variable "lungCap" with a numeric value, and the variable "smoking" with the values "yes" or "no". Now I want to see if the capacity of non-smokers is bigger than that of smokers:
t.test(lungCap~smoking, alt="greater")
Does the test now calculate if "yes" > "no" or "no" > "yes"? How is this determined? I could not find it in the help for the t.test command.
When using a character-based independent variable, t.test()
will compare based on the alphabetical order of the values in the independent variable.
To illustrate, we'll compare miles per gallon in cars with manual vs. automatic transmissions using the 1973 Motor Trend cars data set.
We'll create a character variable to represent automatic vs. manual (to illustrate the scenario in the OP) and run a t test.
We'll test the following hypotheses:
To run the test, we'll load the data, create the extra column and execute t.test()
.
data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"manual","automatic")
t.test(mtcars$mpg ~ mtcars$trans,alt="greater")
...and the output:
> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")
Welch Two Sample t-test
data: mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.9993
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-10.57662 Inf
sample estimates:
mean in group automatic mean in group manual
17.14737 24.39231
What we see here is that t.test()
runs automatic > manual, and hence the p-value is 0.9993.
To correctly run the test we'll modify it to use the alt="less"
argument.
> t.test(mtcars$mpg ~ mtcars$trans,alt="less")
Welch Two Sample t-test
data: mtcars$mpg by mtcars$trans
t = -3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -3.913256
sample estimates:
mean in group automatic mean in group manual
17.14737 24.39231
>
Here we see the reported p-value as 0.0006, meaning that we reject the null hypothesis in favor of the alternate hypothesis that automatic transmission cars have lower average miles per gallon than manual transmission cars.
Responding to the questions in the comments about whether there is a way to change the grouping order, the t.test()
function does not provide a way to do this. However, one can simply add 1.
and 2.
in front of the group names to force t.test()
to use the group that includes 1.
as the first group in the comparison.
Returning to our mtcars
example, if we want manual transmissions to be the first group in the comparison so we get a positive t value for the alternate hypothesis h_alt: mpg(manual) > mpg(automatic)
we could use the following code.
data(mtcars)
mtcars$trans <- ifelse(mtcars$am == 1,"1. manual","2. automatic")
t.test(mtcars$mpg ~ mtcars$trans,alt="greater")
...and the output:
> t.test(mtcars$mpg ~ mtcars$trans,alt="greater")
Welch Two Sample t-test
data: mtcars$mpg by mtcars$trans
t = 3.7671, df = 18.332, p-value = 0.0006868
alternative hypothesis: true difference in means between group 1. manual and group 2. automatic is greater than 0
95 percent confidence interval:
3.913256 Inf
sample estimates:
mean in group 1. manual mean in group 2. automatic
24.39231 17.14737