I have a data frame with 2 columns being age and sex. I'm doing statistical analysis to determine if there's a difference in the age distribution in the two groups of sex. I know that if I don't call data= it will give an error (I believe it's something w/ the dplyr library). I was wondering what the single . in the data parameter does. Does it direct it to the data frame we used before the %>% ?
age_sex.htest <- d %>%
t.test(formula=age~sex, data=.)
As @markus has pointed out, d
is passed to the data
argument in t.test
. Here is the output from data(sleep)
using the .
.
library(dplyr)
data(sleep)
sleep %>% t.test(formula=extra ~ group, data = .)
# Output
Welch Two Sample t-test
data: extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean in group 1 mean in group 2
0.75 2.33
If you put sleep
directly into data
of t.test
, then you will get the same result, as t.test
is running the exact same data.
t.test(formula=extra ~ group, data = sleep)
# Output
Welch Two Sample t-test
data: extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean in group 1 mean in group 2
0.75 2.33
In this case, the .
is not that beneficial, though some people prefer this stylistically (I generally do).
However, it is extremely useful when you want to run the analysis on a slight alteration of the dataframe. So, with the sleep dataset, for example, if you wanted to remove ID == 10
from both groups, then you could remove those with filter
, and then run the t.test
.
sleep %>%
filter(ID != 10) %>%
t.test(formula = extra ~ group, data = .)
So, we pass an altered version of the sleep
dataset without the rows where ID is 10. So now, we will see a change in the output:
Welch Two Sample t-test
data: extra by group
t = -1.7259, df = 15.754, p-value = 0.1039
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-3.5677509 0.3677509
sample estimates:
mean in group 1 mean in group 2
0.6111111 2.2111111