Search code examples
rdatedplyrlubridate

A new data conversion issue in R


I have a data column with two kind of date-format values, the numeric one (es. "38169") and the string one (es. "01/03/2004", always in format "%d/%m/%Y"). I cannot transform them in the same date-format and reconvert them in the standard date-format "%Y-%m-%d".

Forom the example below, I would like to transform the variable date_first in the variable date_clean

Additional information:

The database is imported from Excel. The etherogeneous date format is the result of a string alteration happening in both excel and R

data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"))
                  
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"),
                  date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))

> desidered_data
                   date_all date_first date_clean
1                      <NA>       <NA>       <NA>
2                     38169      38169 2004-07-01
3                     37926      37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01



Solution

  • A base R option -

    change_mix_date <- function(x) {
      #empty date vector to store the results
      new_date <- as.Date(NA)
      #Check for values that have only numbers in them (excel dates)
      inds <- grepl('^\\d+$', x)
      #Change excel date to date class
      new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
      #Change remaining ones to date class using as.Date
      new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
      #Return output. 
      new_date
    }
    
    data$date_clean <- change_mix_date(data$date_first)
    
    #                   date_all date_first date_clean
    #1                      <NA>       <NA>       <NA>
    #2                     38169      38169 2004-07-01
    #3                     37926      37926 2003-11-01
    #4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01