Search code examples
rmeanpopulation

Which is the bigger population using t.test in R? How to tell to the function?


I have a question about using t.test to check if the population mean is bigger than another.

Imagine I have 2 variables in a dataframe d:

Weight: Numerical variable (weight of people).
Anykids: Categorical variable that can be yes or no.

The dataframe would be like:

Anykids Weight
yes     70
yes     84
no      66
...     ..

I want to check if the mean of weight of people with anykids = yes is bigger than the one's with anykids = no. So I wold have:

H0: m(weight_yes) = m(weight_no)
H1: m(weight_yes) > m(weight_no)

The function would be:

t.test(weight~anykids, data = d, alternative = 'greater')

How the function knows that the parameter greater means the group with anykids = yes and not the group with anykids = no?

If I wanted to check the hypothesis:

H0: m(weight_no) = m(weight_yes)
H1: m(weight_no) > m(weight_yes)

The function would had the same parameters. How I know that greater means anykids = yes o anykids = no?


Solution

  • Like many things with factors, R chooses based on the order of the levels of the factor. In your case, you could check using levels(Anykids) to discover in advance which one will be used as x vs. y in the t.test() function, or potentially change the order with relevel().

    But the t-test() results will also just show you which one was considered. Here, in the iris dataset, the versicolor level comes first, and will be considered whether versicolor has a greater mean Sepal.Width than virginica.

    levels(iris$Species)
    #> [1] "setosa"     "versicolor" "virginica"
    test_data <- iris[iris$Species != 'setosa', ]
    t.test(data = test_data, Sepal.Width ~ Species, alternative = "greater")
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  Sepal.Width by Species
    #> t = -3.2058, df = 97.927, p-value = 0.9991
    #> alternative hypothesis: true difference in means is greater than 0
    #> 95 percent confidence interval:
    #>  -0.3096707        Inf
    #> sample estimates:
    #> mean in group versicolor  mean in group virginica 
    #>                    2.770                    2.974