Search code examples
rmagrittrhypothesis-test

When doing a t-test with t.test() while piping, what does the period do in "data=."?


I have a data frame with 2 columns being age and sex. I'm doing statistical analysis to determine if there's a difference in the age distribution in the two groups of sex. I know that if I don't call data= it will give an error (I believe it's something w/ the dplyr library). I was wondering what the single . in the data parameter does. Does it direct it to the data frame we used before the %>% ?

age_sex.htest <- d %>%
   t.test(formula=age~sex, data=.)

Solution

  • As @markus has pointed out, d is passed to the data argument in t.test. Here is the output from data(sleep) using the ..

    library(dplyr)
    data(sleep)
    
    sleep %>% t.test(formula=extra ~ group, data = .)
    
    # Output
        Welch Two Sample t-test
    
    data:  extra by group
    t = -1.8608, df = 17.776, p-value = 0.07939
    alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
    95 percent confidence interval:
     -3.3654832  0.2054832
    sample estimates:
    mean in group 1 mean in group 2 
               0.75            2.33 
    

    If you put sleep directly into data of t.test, then you will get the same result, as t.test is running the exact same data.

    t.test(formula=extra ~ group, data = sleep)
    
    # Output
    
        Welch Two Sample t-test
    
    data:  extra by group
    t = -1.8608, df = 17.776, p-value = 0.07939
    alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
    95 percent confidence interval:
     -3.3654832  0.2054832
    sample estimates:
    mean in group 1 mean in group 2 
               0.75            2.33 
    

    In this case, the . is not that beneficial, though some people prefer this stylistically (I generally do).

    However, it is extremely useful when you want to run the analysis on a slight alteration of the dataframe. So, with the sleep dataset, for example, if you wanted to remove ID == 10 from both groups, then you could remove those with filter, and then run the t.test.

    sleep %>%
      filter(ID != 10) %>%
      t.test(formula = extra ~ group, data = .)
    

    So, we pass an altered version of the sleep dataset without the rows where ID is 10. So now, we will see a change in the output:

        Welch Two Sample t-test
    
    data:  extra by group
    t = -1.7259, df = 15.754, p-value = 0.1039
    alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
    95 percent confidence interval:
     -3.5677509  0.3677509
    sample estimates:
    mean in group 1 mean in group 2 
          0.6111111       2.2111111