Search code examples
rfunctiondplyrchi-squaredstatistical-test

How to define a function in dplyr? - Adding the results of a chi-squared test


I am trying to write a function to give me a pivot table for two variables. Expanding my question here, I would like to include the p-value of a chi-square test for the relationship between the predictor and the target as well. How should I change the function?

library(dplyr)
mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars %>%
  group_by(as.factor(cyl)) %>%
  summarise(sum=sum(mpg_cat),total=n()) %>%
  mutate(percentage=sum*100/total)

# Note: needs installation of rlang 0.4.0 or later
get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor( {{ predictor }} )) %>%
    summarise(sum=sum( {{ target }} ),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

Here is my working example:

mtcars %>%
  group_by(as.factor(cyl)) %>%
  summarise(sum=sum(mpg_cat),total=n(),
            pvalue= chisq.test(as.factor(.$mpg_cat), as.factor(.$cyl))$p.value) %>% 
  mutate(percentage=sum*100/total)

I tried the following function but it did not work.

get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by( {{ predictor }} ) %>%
    summarise(clicks=sum( {{ target }} ),total=n(),
              pvalue= chisq.test(.$target, .$predictor)$p.value) %>%
    mutate(percentage=clicks*100/total);

  print(result)
}

Solution

  • The {{...}} curly-curly interpolation operator is a convenient way for quote-unquote. But, it wouldn't work in all the cases. In the OP's function, a column is extracted with $ ie. the part .$target or .$predictor wouldn't work. Instead, we could convert it to character (as_name) and then extract the column with [[

    library(rlang)
    library(dplyr)
    
    get_pivot <- function(data, predictor,target) {
    
         data %>%
         group_by( {{ predictor }} ) %>%
         summarise(clicks=sum( {{ target }} ),total=n(),
                   pvalue= chisq.test(.[[as_name(enquo(target))]], 
                           .[[as_name(enquo(predictor))]])$p.value) %>%
         mutate(percentage=clicks*100/total);
    
    
     }
    
    get_pivot(mtcars, cyl, mpg_cat)
    # A tibble: 3 x 5
    #    cyl clicks total     pvalue percentage
    #  <dbl>  <dbl> <int>      <dbl>      <dbl>
    #1     4     11    11 0.00000366      100  
    #2     6      3     7 0.00000366       42.9
    #3     8      0    14 0.00000366        0