I have a dataframe "data" that contains
I want to run a t-test for each job role in each country to see if there is a significant paygap between the genders in the same job role and country.
I create a nested dataframe which contains dataframes with at least 20 observations:
dataNested <- data %>%
select(CPNo, Gender, JobRole, Country, AnnualSalaryLocal) %>%
nest(data = c(CPNo, Gender, AnnualSalaryLocal)) %>% filter(map_int(data, nrow) > 20)
And I want to run a t-test on that nested dataframe:
dataNested %>%
mutate(t_test = map(data, ~t.test(.x$AnnualSalaryLocal ~ .x$Gender, var.eq=F, paired=F)))
Now, if I run the code I get the following table which is a nested dataframe that contain the results of my t-tests:
JobRole
<fctr>
JobStage
<fctr>
Country
<fctr>
data
<list>
t_test
<list>
76 Product Development 06 Ireland <tibble> <S3: htest>
76 Product Development 06 Italy <tibble> <S3: htest>
82 Service Delivery 05 Italy <tibble> <S3: htest>
82 Service Delivery 06 Italy <tibble> <S3: htest>
82 Service Delivery 03 Mexico <tibble> <S3: htest>
83 Supply & Logistics 01 Mexico <tibble> <S3: htest>
76 Product Development 05 Poland <tibble> <S3: htest>
How do I write the syntax if I want to add a new variable "sig" which extracts the p.value from my "t_test" variable?
You can extract by using broom::tidy()
. Here's an example using the gapminder dataset:
library(gapminder)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
gapminder |>
filter(continent %in% c("Europe", "Asia")) |>
group_by(year) |>
nest() |>
mutate(t_test = map(data, ~ t.test(.x$lifeExp ~ .x$continent, var.eq = F, paired = F)),
res = map(t_test, tidy)) |>
unnest(res) |>
ungroup()
# A tibble: 12 × 13
year data t_test estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
<int> <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 1952 <tibble [63 × 5]> <htest> -18.1 46.3 64.4 -9.09 1.14e-12 56.8 -22.1 -14.1 Welch Two Sample t-test two.sided
2 1957 <tibble [63 × 5]> <htest> -17.4 49.3 66.7 -8.98 4.73e-12 50.6 -21.3 -13.5 Welch Two Sample t-test two.sided
3 1962 <tibble [63 × 5]> <htest> -17.0 51.6 68.5 -9.02 1.24e-11 44.7 -20.8 -13.2 Welch Two Sample t-test two.sided
4 1967 <tibble [63 × 5]> <htest> -15.1 54.7 69.7 -8.29 2.01e-10 42.5 -18.7 -11.4 Welch Two Sample t-test two.sided
5 1972 <tibble [63 × 5]> <htest> -13.5 57.3 70.8 -7.50 3.96e- 9 39.6 -17.1 -9.83 Welch Two Sample t-test two.sided
6 1977 <tibble [63 × 5]> <htest> -12.3 59.6 71.9 -6.72 5.46e- 8 38.7 -16.0 -8.61 Welch Two Sample t-test two.sided
7 1982 <tibble [63 × 5]> <htest> -10.2 62.6 72.8 -6.38 1.18e- 7 41.7 -13.4 -6.96 Welch Two Sample t-test two.sided
8 1987 <tibble [63 × 5]> <htest> -8.79 64.9 73.6 -5.71 1.04e- 6 42.1 -11.9 -5.68 Welch Two Sample t-test two.sided
9 1992 <tibble [63 × 5]> <htest> -7.90 66.5 74.4 -5.19 5.54e- 6 42.7 -11.0 -4.83 Welch Two Sample t-test two.sided
10 1997 <tibble [63 × 5]> <htest> -7.48 68.0 75.5 -4.93 1.34e- 5 42.0 -10.5 -4.42 Welch Two Sample t-test two.sided
11 2002 <tibble [63 × 5]> <htest> -7.47 69.2 76.7 -4.81 2.13e- 5 40.3 -10.6 -4.33 Welch Two Sample t-test two.sided
12 2007 <tibble [63 × 5]> <htest> -6.92 70.7 77.6 -4.65 3.39e- 5 41.5 -9.93 -3.91 Welch Two Sample t-test two.sided