Search code examples
rtidymodels

Equality proportions test for a subset of categories of a categorical variable


The vignette of infer package gives examples to test equality for ALL categories but not for a SUBSET of the categories.

For example, in infer::gss dataset, is there a way to test for the income variable if the proportion of $25000 or more is equal to 20000-24999?

Thank you

R tidymodels/infer


Solution

  • We can filter the levels of the 'income', remove the unused levels (droplevels) and use that in the test

    library(dplyr)
    library(infer)
    gss %>% 
       filter(income %in% c("$20000 - 24999",  "$25000 or more" ) ) %>% 
       droplevels %>% 
       specify(response = income, success = "$20000 - 24999") %>% 
       hypothesize(null = "point", p = .5) %>%
       generate(reps = 1000) %>%
       calculate(stat = "prop")