Search code examples
rfunctiondplyrtidyverseeval

custom function to handle different date formats from excel trying to use curly curly


I have a dataframe imported from excel with read_excel that looks like this:

The main task is to handle the different formats of dates:

I would like to implement it into a custom function (and I am not good at all in creating functions):

df <- structure(list(date = c("40574", "40861", "40870", "40990", "07.03.2022", 
"14.03.2022", "16.03.2022", "27.03.2022", "24.03.2022", "24.03.2022"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

# A tibble: 10 x 1
   date      
   <chr>     
 1 40574     
 2 40861     
 3 40870     
 4 40990     
 5 07.03.2022
 6 14.03.2022
 7 16.03.2022
 8 27.03.2022
 9 24.03.2022
10 24.03.2022

I solved this task with this code:

library(tidyverse)
library(janitor)
library(lubridate)

df %>% 
  mutate(new_col = excel_numeric_to_date(as.numeric(as.character(date)), date_system = "modern"), .before=1) %>% 
  mutate(date = ifelse(!str_detect(date, '\\.'), NA_character_, date)) %>% 
  mutate(date = dmy(date)) %>% 
  mutate(date = coalesce(date, new_col), .keep="unused")

From this code I would like to make a custom function with this code:

mixed_dateColumn_excel <- function(df, x) {
  x <- {{x}}
  df %>% 
    mutate(new_col = excel_numeric_to_date(as.numeric(as.character(x)), date_system = "modern"), .before=1) %>% 
    mutate(x = ifelse(!str_detect(x, '\\.'), NA_character_, x)) %>% 
    mutate(x = dmy(x)) %>% 
    mutate(x = coalesce(x, new_col), .keep="unused")
}

I would like to know why:

This works not:

mixed_dateColumn_excel(df, "date")

This works not also:

mixed_dateColumn_excel(df, date)

And this works:

mixed_dateColumn_excel(df, df$date)

Solution

  • You can't unquote in this way using the curly-curly operator TarJae. It has to be done directly inside the tidyverse functions.

    In your version, the line:

    x <- {{x}}
    

    doesn't really do anything - you would get the same results if you removed this line altogether. Your third version where you use df$date doesn't really work, since it creates a column called x, with the calculations inside the mutate functions simply working on the passed vector df$date. The version with "date" doesn't work because you are passing a literal character string for the calculations, and date doesn't work because without using data masking properly, R cannot find this variable.

    Perhaps the easiest way to do this, since you also want to use the unquoted x for the column name inside mutate is to use rlang::ensym, but you still need to unquote with the bang-bang operator, and when you are assigning columns you need the assignment operator, :=

    mixed_dateColumn_excel <- function(df, x) {
      
      x <- rlang::ensym(x)
      
      df %>% 
        mutate(new_col = suppressWarnings(janitor::excel_numeric_to_date(
                           as.numeric(as.character(!!x)), 
                           date_system = "modern")), .before = 1) %>% 
        mutate(!!x := ifelse(!str_detect(!!x, '\\.'), NA_character_, !!x)) %>% 
        mutate(!!x := lubridate::dmy(!!x)) %>% 
        mutate(!!x := coalesce(!!x, new_col), .keep="unused")
    }
    
    
    mixed_dateColumn_excel(df, date)
    #> # A tibble: 10 x 1
    #>    date      
    #>    <date>    
    #>  1 2011-01-31
    #>  2 2011-11-14
    #>  3 2011-11-23
    #>  4 2012-03-22
    #>  5 2022-03-07
    #>  6 2022-03-14
    #>  7 2022-03-16
    #>  8 2022-03-27
    #>  9 2022-03-24
    #> 10 2022-03-24
    

    Created on 2022-04-12 by the reprex package (v2.0.1)