Search code examples
rdatedatasettidyverselubridate

How to create a user generated function in R which converts all values in a column to date format?


Here is a very small subset of my dataset:

db_country <- tibble(country = c("Argentina", "Australia", "Austria"),
                     region = c("Americas", "Asia", "Europe"),
                     start_date = as.numeric(18487, 18487, 18487))

# A tibble: 3 x 3
  country   region   start_date
  <chr>     <chr>         <dbl>
1 Argentina Americas      18487
2 Australia Asia          18487
3 Austria   Europe        18487

As you can see the start_date column values are in Unix Epoch time. I want to change these to regular modern-day dates. My actual dataset has many tables with many rows and columns which require conversion.

So rather than running multiple long lines of code, I want to create my own function in R which does the same thing but in fewer characters. Usually, I would do something like this:

db_country <- db_country %>% mutate(start_date = as_date(start_date))

Since I want to make a shortcut function I tried the following but they gave me errors:

(I did load the tidyverse and lubridate packages)

mydate1 <- function(dataset, column) {
  dataset <- dataset %>% mutate(column = as_date(column))
}

mydate1(db_country, start_date)

# Error: Problem with `mutate()` input `column`.
# x error in evaluating the argument 'x' in selecting a method for function 'as_date':
#  object 'start_date' not found
# i Input `column` is `as_date(column)
mydate2 <- function(dataset, column) {
  dataset$column <- as_date(dataset, dataset$column)
}

mydate2(db_country, start_date)

# Error in as.Date.default(x, ...) : 
#  do not know how to convert 'x' to class “Date” 
mydate3 <- function(dataset, column) {
  dataset$column <- as.Date.numeric(dataset, dataset$column)

mydate3(db_country, start_date)

# Error in as.Date(origin, ...) + x : 
#  non-numeric argument to binary operator
# In addition: Warning messages:
# 1: Unknown or uninitialised column: `column`. 
# 2: In as.Date.numeric(dataset, dataset$column) :
#   Incompatible methods ("+.Date", "Ops.data.frame") for "+"

I would really appreciate any help or advice with this :)


Solution

  • You have to use non-standard evaluation (NSE) while referrring column names in function.

    If you want to pass unquoted names in the function use {{}} :

    library(dplyr)
    library(lubridate)
    library(rlang)
    
    mydate1 <- function(dataset, column) {
      dataset %>% mutate({{column}} := as_date({{column}}))
    }
    
    mydate1(db_country, start_date)
    # A tibble: 3 x 3
    #  country   region   start_date
    #  <chr>     <chr>    <date>    
    #1 Argentina Americas 2020-08-13
    #2 Australia Asia     2020-08-13
    #3 Austria   Europe   2020-08-13
    

    If you want to pass quoted names change the function to :

    mydate1 <- function(dataset, column) {
      dataset %>% mutate(!!column := as_date(.data[[column]]))
    }
    
    mydate1(db_country, 'start_date')