Search code examples
rdate-conversion

Trimming and reformatting dates in R


I have a column of data with the following types of dates and number entries:

16-Jun
21-01A
7-04
Aug-99
5-09

I want to convert these all into numbers, by doing two things. First, where the data have a number before a dash (as in the first three examples), I want to trim the data from the dash onwards. So the entries would appear 16, 21 and 7.

Second, where the entry is written in month-date format (e.g. Aug-99), I want to convert that to the number of the month and then trim it. so this example, would be to convert the date to 8-99 then trim to just 8.

How can I do this in R? When I use grep, sub and match commands, as in the answer below, I get: [1] 16 21 7 5 8

When I am after: [1] 16 21 7 8 5


Solution

  • We use grep to find the index of elements that start with alphabets. Remove the substring that starts from - to the end of the string with sub. Subset the 'v2' based on 'i1' and convert to numeric while we match the ones starting with alphabets to month.abb and get the index of month, concatenate the output.

    i1 <- grepl("^[A-Z]", v1)
    v2 <- sub("-.*", "", v1)
    c(as.numeric(v2[!i1]), match(v2[i1], month.abb))
    #[1] 16 21  7  8
    

    For the new dataset, we can use ifelse

    i1 <- grepl("^[A-Z]", df1$v1)
    v2 <- sub("-.*", "", df1$v1)
    as.numeric(ifelse(i1, match(v2, month.abb), v2))
    #[1] 16 21  7  8  5
    

    data

    v1 <- c('16-Jun','21-01A','7-04','Aug-99') 
    df1 <- structure(list(v1 = c("16-Jun", "21-01A", "7-04", "Aug-99", "5-09"
    )), .Names = "v1", class = "data.frame", row.names = c(NA, -5L))