Search code examples
rdplyrtidyversetidyeval

Join and group_by tidy eval issue


I have the following function that I have put together. It works up until the last part (noted in a comment in the code) where it has to join the objects together. I don't know how to get it to work. I believe my main problem has to do with converting the colName argument into a string for the "by =" argument of the joiner function. In relation to the group_by function, I'm not sure if what I have put there in the curly brackets will work. If anyone could help that would be great!

   emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  #### PROBLEM BEGINS HERE
  join_turnover_year <- full_join(start_test, end_test, by = str(colName)) %>%
    full_join(y = term_test, by = str(colName)) %>%
    setNames(c(str(colName), "Start_Headcount", "End_Headcount", "Terminations")) %>%
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
  
  return(join_turnover_year)
}

Solution

  • The issue is using str which gets the structure of an object. Assuming that colName is passed as a string, we don't need any wrapping. Inside the function it is converted to symbol with ensym. So, either get the input (assume it is a string) before converting to symbol as a different object or make use of as_string from rlang

     emp_turnover_fun <- function(data, colName, year = "2015") {
      
      # Convert colName to symbol or check if symbol
      colName <- ensym(colName)
      colName_str <- rlang::as_string(colName) ## converted to string
    
      
      # Terminations by year and variable in df
      term_test <- data %>%
        filter(year(DateofTermination) == year) %>%
        count(!!(colName)) %>%
        clean_names()
      
      # Start employees by var and year
      fun_year_job <- paste(year, "-01-01", sep = "")
      start_test <- data %>%
        select(DateofHire, DateofTermination, !!(colName)) %>%
        filter(
          DateofHire <= fun_year_job,
          DateofTermination > fun_year_job | is.na(DateofTermination)
        ) %>%
        count(!!(colName))
      
      # End employees by year and var
      year_pos <- year %>% as.character()
      year_num_plus_pos <- as.character(as.numeric(year_pos) + 1)
      fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
      
      end_test <- data %>%
        select(DateofHire, DateofTermination, !!(colName)) %>%
        filter(
          DateofHire <= fun_year2_pos,
          DateofTermination > fun_year2_pos | is.na(DateofTermination)
        ) %>%
        count(!!(colName))
      
      join_turnover_year <- full_join(start_test, end_test, 
                 by = colName_str) %>% # use the string
        full_join(y = term_test, by = colName_str) %>% # use the string
        setNames(c(colName_str, "Start_Headcount", "End_Headcount", 
                 "Terminations")) %>% # here as well
        group_by({{colName}}) %>%
        summarise(Turnover = ((Terminations) / (Start_Headcount + End_Headcount)) * 100)
      
      return(join_turnover_year)
    }
    

    It is safer to do as_string as opposed to taking the input directly as string i.e. ensym can work with both unquoted or quoted values, thus if we are passing unquoted, then grabbing the input doesn't work i.e. it may need deparse(substitute(colName)). Instead, first convert to symbol and then do the conversion back to string with as_string