Search code examples
rstrptime

Using strptime on NA values


I need to use the strptime function to convert timestamps which look like the following:

Tue Feb 11 12:18:36 +0000 2014
Tue Feb 11 12:23:22 +0000 2014
Tue Feb 11 12:26:26 +0000 2014
Tue Feb 11 12:28:02 +0000 2014

As required, I have copied this into a csv file and read it into R:

timestamp_data <- read.table('timestamp_data.csv')

I then tried to convert it to recognized times using:

timestamp_data_formatted <- strptime(timestamp_data[,1], format ="%a %b %d %H:%M:%S %z %Y")

I still get NA values when I try and view the formatted data in R. I think the problem is that when I view my imported csv data in R, instead of showing '+0000' it simply shows 0. How can I fix this?


Solution

  • You're using read.table, not read.csv. The former splits on whitespace and thus is splitting the datetimes into multiple columns:

    df <- read.table(text = 'Tue Feb 11 12:18:36 +0000 2014
    Tue Feb 11 12:23:22 +0000 2014
    Tue Feb 11 12:26:26 +0000 2014
    Tue Feb 11 12:28:02 +0000 2014')
    
    df
    #>    V1  V2 V3       V4 V5   V6
    #> 1 Tue Feb 11 12:18:36  0 2014
    #> 2 Tue Feb 11 12:23:22  0 2014
    #> 3 Tue Feb 11 12:26:26  0 2014
    #> 4 Tue Feb 11 12:28:02  0 2014
    
    str(df)
    #> 'data.frame':    4 obs. of  6 variables:
    #>  $ V1: Factor w/ 1 level "Tue": 1 1 1 1
    #>  $ V2: Factor w/ 1 level "Feb": 1 1 1 1
    #>  $ V3: int  11 11 11 11
    #>  $ V4: Factor w/ 4 levels "12:18:36","12:23:22",..: 1 2 3 4
    #>  $ V5: int  0 0 0 0
    #>  $ V6: int  2014 2014 2014 2014
    

    If you use read.csv (with sensible arguments), it works:

    df <- read.csv(text = 'Tue Feb 11 12:18:36 +0000 2014
    Tue Feb 11 12:23:22 +0000 2014
    Tue Feb 11 12:26:26 +0000 2014
    Tue Feb 11 12:28:02 +0000 2014', header = FALSE, stringsAsFactors = FALSE)
    
    df$datetime <- as.POSIXct(df$V1, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
    
    df
    #>                               V1            datetime
    #> 1 Tue Feb 11 12:18:36 +0000 2014 2014-02-11 12:18:36
    #> 2 Tue Feb 11 12:23:22 +0000 2014 2014-02-11 12:23:22
    #> 3 Tue Feb 11 12:26:26 +0000 2014 2014-02-11 12:26:26
    #> 4 Tue Feb 11 12:28:02 +0000 2014 2014-02-11 12:28:02
    
    str(df)
    #> 'data.frame':    4 obs. of  2 variables:
    #>  $ V1      : chr  "Tue Feb 11 12:18:36 +0000 2014" "Tue Feb 11 12:23:22 +0000 2014" "Tue Feb 11 12:26:26 +0000 2014" "Tue Feb 11 12:28:02 +0000 2014"
    #>  $ datetime: POSIXct, format: "2014-02-11 12:18:36" "2014-02-11 12:23:22" ...
    

    I'm using as.POSIXct here instead of strptime because the former is usually what you'll need, but strptime works now, too.