Search code examples
rtidyversecase-whentidyeval

Making tidyeval function inside case_when


I have a data set that I like to impute one value among others based on probability distribution of those values. Let make some reproducible example first

library(tidyverse)
library(janitor)

dummy1 <- runif(5000, 0, 1)
dummy11 <- case_when(
    dummy1 < 0.776 ~ 1,
    dummy1 < 0.776 + 0.124 ~ 2,
    TRUE ~ 5)

df1 <- tibble(q1 = dummy11)

here is the output:

df1 %>% tabyl(q1)
 q1    n percent
  1 3888  0.7776
  2  605  0.1210
  5  507  0.1014

I used mutate and sample to share value= 5 among value 1 and 2 like this:

df1 %>%
    mutate(q1 = case_when(q1 == 5 ~ sample(
        2,
        length(q1),
        prob = c(0.7776, 0.1210),
        replace = TRUE
    ),
    TRUE ~ as.integer(q1))
    )

and here is the result :

q1    n percent
  1 4322  0.8644
  2  678  0.1356

This approach seems working, however since I need to apply this for several variables I tried to write a function that working with tidyverse with tidyeval, like this

    my_impute <- function(.data, .prob_var, ...) {
        .prob_var <- enquo(.prob_var)

        .data %>%
            sample(2, prob=c(!!.prob_var), replace = TRUE) 
    }

# running on data 
df1 %>%
    mutate(q1 = case_when(q1 == 5 ~ !!my_impute(q1),
    TRUE ~ as.integer(q1))
    )

The error is :

Error in eval_tidy(pair$lhs, env = default_env) : object 'q1' not found

Solution

  • We need the prob values from the 'percent' column generated from tabyl, so the function can be modified to

    library(janitor)
    library(dplyr)
    
    my_impute <- function(.data, .prob_var, vals, ...) {
            .prob_var = enquo(.prob_var)
            .prob_vals <- .data %>%
                 janitor::tabyl(!!.prob_var) %>%
                 filter(!!.prob_var %in% vals) %>%
                 pull(percent)
    
             .data %>%
                  mutate(!! .prob_var := case_when(!! .prob_var == 5 ~ 
                    sample(
                            2,
                            n(),
                            prob = .prob_vals,
                            replace = TRUE
                        ),
                        TRUE ~ as.integer(q1))
                        )
        }
    
    
    df1 %>% 
         my_impute(q1, vals = 1:2) %>%
         tabyl(q1)
    # q1    n percent
    # 1 4285   0.857
    # 2  715   0.143