Search code examples
rgoogle-docs-apidate-conversion

Google Sheets Date and Time Objects Imported to R


I used the RGoogleDocs package by OmegaHat to import my Google Sheets spreadsheet to R. Some of it comprises of Date data, Time data and date/time data.

Upon importing to R, RGoogleDocs converts dates and times to decimal numbers presumably as the number of seconds from starting reference 1900-1-1 (or 1899-12-30).

This is not a problem for dates as I can simply use as.Date() to convert it. However, time data is a huge nuisance.

I know that I can use the as.POSIXct() or as.POSIXlt() however this does not accurately convert the decimal numbers back to dates and times.

Observe:

Timestamp:
2/10/2014 19:22:40; 2/10/2014 19:52:07; 2/10/2014 20:14:47;

2/10/2014 21:08:03; 2/11/2014 11:05:39; 2/11/2014 12:21:40;

2/11/2014 12:44:33;  2/11/2014 16:08:54; 2/11/2014 16:31:44;

2/12/2014 10:54:31; 2/12/2014 11:38:15; 2/12/2014 11:56:18;

2/12/2014 12:31:53; 2/12/2014 14:42:46; 2/12/2014 15:03:36;

2/13/2014 14:38:17; 2/13/2014 15:08:45; 2/13/2014 16:39:27;

2/13/2014 17:18:45; 2/13/2014 19:14:23; 2/19/2014 15:01:16;

2/19/2014 15:05:43;

RGoogleDocs' Imported Version:

> mydata$"Timestamp"
 [1] 41680.81 41680.83 41680.84 41680.88 41681.46 41681.52 41681.53 41681.67
 [9] 41681.69 41682.45 41682.48 41682.50 41682.52 41682.61 41682.63 41683.61
[17] 41683.63 41683.69 41683.72 41683.80 41689.63 41689.63

After conversion:

    as.POSIXct(mydata$"Timestamp"*86400, origin = "1899-12-30")
 [1] "2014-02-11 00:22:40 PKT" "2014-02-11 00:52:07 PKT" "2014-02-11 01:14:47 PKT"
 [4] "2014-02-11 02:08:02 PKT" "2014-02-11 16:05:39 PKT" "2014-02-11 17:21:40 PKT"
 [7] "2014-02-11 17:44:33 PKT" "2014-02-11 21:08:54 PKT" "2014-02-11 21:31:44 PKT"
[10] "2014-02-12 15:54:31 PKT" "2014-02-12 16:38:15 PKT" "2014-02-12 16:56:18 PKT"
[13] "2014-02-12 17:31:53 PKT" "2014-02-12 19:42:45 PKT" "2014-02-12 20:03:36 PKT"
[16] "2014-02-13 19:38:17 PKT" "2014-02-13 20:08:45 PKT" "2014-02-13 21:39:27 PKT"
[19] "2014-02-13 22:18:45 PKT" "2014-02-14 00:14:22 PKT" "2014-02-19 20:01:16 PKT"
[22] "2014-02-19 20:05:42 PKT"

Not only are the times inaccurate, the error progressively increases down the list. Can anyone help?


Solution

  • Thx for posting the link. I made a copy (so you can un-share it if you want now :-). Despite the spreadsheet properties being GMT-8, it looks like RGoogleDocs is giving you the numeric bit in GMT (UTC). I'm not posting all the setup code for RGoogleDocs but I assume you did something like the following:

    docs <- getDocs(sheets.con)
    ts <- getWorksheets(docs$rrr, sheets.con)
    dat <- sheetAsMatrix(ts$`Form Responses`, header = TRUE, as.data.frame = TRUE, trim = TRUE)
    

    RGoogleDocs does, indeed, send back the number of days value vs the timestamp string (despite the CSV export using the string version):

    dat$Timestamp
    ##  [1] 41689.63 41683.72 41680.88 41680.81 41683.69 41681.67 41681.69 41682.45 41682.48 41682.50 41683.80
    ## [12] 41689.63 41680.83 41681.52 41681.46 41680.84 41682.52 41682.61 41683.61 41683.63 41681.53 41682.63
    

    But, I'm not seeing the conversion errors (perhaps a few seconds) that you are:

    as.POSIXct(dat$Timestamp*86400, origin="1899-12-30 00:00:00", tz="GMT")
    ##  [1] "2014-02-19 15:05:42 GMT" "2014-02-13 17:18:45 GMT" "2014-02-10 21:08:03 GMT"
    ##  [4] "2014-02-10 19:22:39 GMT" "2014-02-13 16:39:27 GMT" "2014-02-11 16:08:53 GMT"
    ##  [7] "2014-02-11 16:31:44 GMT" "2014-02-12 10:54:31 GMT" "2014-02-12 11:38:14 GMT"
    ## [10] "2014-02-12 11:56:18 GMT" "2014-02-13 19:14:23 GMT" "2014-02-19 15:01:15 GMT"
    ## [13] "2014-02-10 19:52:07 GMT" "2014-02-11 12:21:40 GMT" "2014-02-11 11:05:38 GMT"
    ## [16] "2014-02-10 20:14:46 GMT" "2014-02-12 12:31:52 GMT" "2014-02-12 14:42:45 GMT"
    ## [19] "2014-02-13 14:38:17 GMT" "2014-02-13 15:08:45 GMT" "2014-02-11 12:44:33 GMT"
    ## [22] "2014-02-12 15:03:36 GMT"
    

    (Pasting the CSV export from Google Docs vs use a screen cap)

    Timestamp,DOA,DOD,Length of Stay,Time of Event,Date of Event,Presentation Time,Presentation Date,Time to Presentation
    2/19/2014 15:05:43,9/6/2010,9/10/2010,4,,,0:30:00,9/6/2010,
    2/13/2014 17:18:45,12/28/2009,1/1/2010,4,16:13:00,,20:59:00,12/28/2009,
    2/10/2014 21:08:03,12/28/2009,1/2/2010,5,16:13:00,12/28/2009,17:51:00,12/28/2009,
    2/10/2014 19:22:40,12/28/2009,1/1/2010,4,16:00:00,12/28/2009,22:00:00,12/28/2009,
    2/13/2014 16:39:27,3/22/2013,3/29/2013,7,,,4:46:00,3/22/2013,
    2/11/2014 16:08:54,2/21/2013,4/3/2013,41,18:00:00,2/16/2013,2:49:00,2/21/2013,
    2/11/2014 16:31:44,2/21/2013,2/27/2013,6,18:00:00,,5:35:00,2/21/2013,
    2/12/2014 10:54:31,2/21/2013,3/5/2013,12,,,5:35:00,2/21/2013,
    2/12/2014 11:38:15,2/21/2013,3/12/2013,19,18:00:00,2/20/2013,3:14:00,2/21/2013,
    2/12/2014 11:56:18,2/21/2013,2/27/2013,6,,,3:16:00,2/21/2013,
    2/13/2014 19:14:23,3/4/2013,3/21/2013,17,19:00:00,,2:05:00,3/4/2013,
    2/19/2014 15:01:16,3/3/2013,3/19/2013,16,22:00:00,,0:00:00,3/4/2013,
    2/10/2014 19:52:07,3/3/2013,3/5/2013,2,,3/3/2013,22:16:00,3/3/2013,
    2/11/2014 12:21:40,4/24/2013,5/28/2013,34,19:00:00,4/23/2013,23:30:00,4/24/2013,
    2/11/2014 11:05:39,3/5/2013,3/13/2013,8,21:25:00,3/3/2013,4:17:00,3/5/2013,
    2/10/2014 20:14:47,5/5/2013,6/3/2013,29,21:45:00,5/5/2013,0:45:00,5/5/2013,
    2/12/2014 12:31:53,11/23/2013,11/29/2013,6,23:35:00,11/23/2013,1:01:00,11/23/2013,
    2/12/2014 14:42:46,11/26/2013,12/11/2013,15,9:20:00,,22:18:00,11/26/2013,
    2/13/2014 14:38:17,12/12/2013,12/24/2013,12,,,21:56:00,12/11/2013,
    2/13/2014 15:08:45,3/4/2013,3/14/2013,10,,3/3/2013,2:30:00,3/4/2013,
    2/11/2014 12:44:33,8/9/2013,8/20/2013,11,,8/8/2013,10:43:00,8/9/2013,
    2/12/2014 15:03:36,9/28/2013,9/30/2013,2,,,12:14:00,9/28/2013,
    

    Can you verify the time zone settings both for your Google Drive defaults and the spreadsheet itself? That might be the issue.