Search code examples
rstrptimeas.date

As.Date returns error when applied to column


I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.

head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019

I can apply as.date to an individual row:

as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')

[1] "2020-03-31

But when I try to use as.Date on the entire column, I get:

df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')

Error in strptime(x, format, tz = "GMT") : input string is too long

What am I doing wrong? Is there another command I'm missing here?


Solution

  • (Too long for a comment.)

    It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...

    You could also try plotting nchar(df$created_at) to see if anything pops out.


    df <- data.frame(created_at=c(
       "Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
       "Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
       "Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
    
    df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')