Search code examples
rnestedpurrrstat

How to run t-tests on a nested dataframe


I have a dataframe "data" that contains

  • employee ID ("CPNo") - int
  • Gender - factor
  • Job Role - factor
  • Country - factor
  • Annual Salary - int

I want to run a t-test for each job role in each country to see if there is a significant paygap between the genders in the same job role and country.

I create a nested dataframe which contains dataframes with at least 20 observations:

dataNested <- data %>% 
  select(CPNo, Gender, JobRole, Country, AnnualSalaryLocal) %>% 
  nest(data = c(CPNo, Gender, AnnualSalaryLocal)) %>% filter(map_int(data, nrow) > 20)

And I want to run a t-test on that nested dataframe:

dataNested %>% 
  mutate(t_test = map(data, ~t.test(.x$AnnualSalaryLocal ~ .x$Gender, var.eq=F, paired=F)))

Now, if I run the code I get the following table which is a nested dataframe that contain the results of my t-tests:

JobRole
<fctr>
JobStage
<fctr>
Country
<fctr>
data
<list>
t_test
<list>
76 Product Development  06  Ireland <tibble>    <S3: htest>
76 Product Development  06  Italy   <tibble>    <S3: htest>
82 Service Delivery 05  Italy   <tibble>    <S3: htest>
82 Service Delivery 06  Italy   <tibble>    <S3: htest>
82 Service Delivery 03  Mexico  <tibble>    <S3: htest>
83 Supply & Logistics   01  Mexico  <tibble>    <S3: htest>
76 Product Development  05  Poland  <tibble>    <S3: htest>

How do I write the syntax if I want to add a new variable "sig" which extracts the p.value from my "t_test" variable?


Solution

  • You can extract by using broom::tidy(). Here's an example using the gapminder dataset:

    library(gapminder)
    library(dplyr)
    library(tidyr)
    library(purrr)
    library(broom)
    
    gapminder |> 
      filter(continent %in% c("Europe", "Asia")) |> 
      group_by(year) |> 
      nest() |> 
      mutate(t_test = map(data, ~ t.test(.x$lifeExp ~ .x$continent, var.eq = F, paired = F)),
             res = map(t_test, tidy)) |> 
      unnest(res) |>
      ungroup()
    
    # A tibble: 12 × 13
        year data              t_test  estimate estimate1 estimate2 statistic  p.value parameter conf.low conf.high method                  alternative
       <int> <list>            <list>     <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
     1  1952 <tibble [63 × 5]> <htest>   -18.1       46.3      64.4     -9.09 1.14e-12      56.8   -22.1     -14.1  Welch Two Sample t-test two.sided  
     2  1957 <tibble [63 × 5]> <htest>   -17.4       49.3      66.7     -8.98 4.73e-12      50.6   -21.3     -13.5  Welch Two Sample t-test two.sided  
     3  1962 <tibble [63 × 5]> <htest>   -17.0       51.6      68.5     -9.02 1.24e-11      44.7   -20.8     -13.2  Welch Two Sample t-test two.sided  
     4  1967 <tibble [63 × 5]> <htest>   -15.1       54.7      69.7     -8.29 2.01e-10      42.5   -18.7     -11.4  Welch Two Sample t-test two.sided  
     5  1972 <tibble [63 × 5]> <htest>   -13.5       57.3      70.8     -7.50 3.96e- 9      39.6   -17.1      -9.83 Welch Two Sample t-test two.sided  
     6  1977 <tibble [63 × 5]> <htest>   -12.3       59.6      71.9     -6.72 5.46e- 8      38.7   -16.0      -8.61 Welch Two Sample t-test two.sided  
     7  1982 <tibble [63 × 5]> <htest>   -10.2       62.6      72.8     -6.38 1.18e- 7      41.7   -13.4      -6.96 Welch Two Sample t-test two.sided  
     8  1987 <tibble [63 × 5]> <htest>    -8.79      64.9      73.6     -5.71 1.04e- 6      42.1   -11.9      -5.68 Welch Two Sample t-test two.sided  
     9  1992 <tibble [63 × 5]> <htest>    -7.90      66.5      74.4     -5.19 5.54e- 6      42.7   -11.0      -4.83 Welch Two Sample t-test two.sided  
    10  1997 <tibble [63 × 5]> <htest>    -7.48      68.0      75.5     -4.93 1.34e- 5      42.0   -10.5      -4.42 Welch Two Sample t-test two.sided  
    11  2002 <tibble [63 × 5]> <htest>    -7.47      69.2      76.7     -4.81 2.13e- 5      40.3   -10.6      -4.33 Welch Two Sample t-test two.sided  
    12  2007 <tibble [63 × 5]> <htest>    -6.92      70.7      77.6     -4.65 3.39e- 5      41.5    -9.93     -3.91 Welch Two Sample t-test two.sided