Search code examples
rsqlitedatetimestrftimesqldf

SQLite doesn't report correct weekday with dataframe in R


I am trying to count the number of times each weekday occurs in a dataframe. My dataframe looks like this when printed on screen (assume the columns are labeled "x")weekday problem

I am currently using

sqldf("SELECT x, strftime('%w', x) FROM maxkwpeakdates")

also, if I run the code

sqldf("SELECT strftime('%w', date('now'))")

I get the right weekday.

Knowing that strftime is from sqlite and that their datetime functions are 0 based, those numbers are off. Looking at the 1st line the result should be a 3 because April 6, 2011 occurred on a Wednesday. The 10th line should also result in a 3 because January 4th 2012 also occurred on a Wednesday.

My data is actually in a dataframe stored within a shiny app which is why I am not using something like RMySQL and am instead using sqldf. I want to be able to run SQL queries on my dataframe so that I can keep the data inside the app and so that I can calculate things like average number of specific weekdays in a time period. Here is an example of my dataframe which is called maxkwpeakdates:

dataframe

I did another test with my data and compared the results of using

sqldf("SELECT x, strftime('%w', x) FROM maxkwpeakdates")

and

weekdays(as.Date(maxkwpeakdates$x))

Then I stored these results in a dataframe so I could compare the results.

comparing methods

Notice that using as.Date reported the correct weekdays but using strftime was almost always off. Is this a problem with R talking to sqlite through strftime? Is there a way to fix this so I get the correct result with strftime?


Solution

  • 1) SQLite SQLite does not have any date/time type so all R does is send the internal representation, seconds since the Epoch, to SQLite and then it is up to the SQL statement to interpret it somehow. SQLite does have datetime functions but they don't assume R's internal representation and conversion is required like this:

    dd <- data.frame(now = Sys.time()) # test data
    dd
    ##                   now
    ## 1 2017-03-29 07:39:30
    
    format(dd$now, "%A") # check
    ## [1] "Wednesday"
    
    
    sqldf("select strftime('%w', now, 'unixepoch', 'localtime') dayno from dd")
    ##   dayno
    ## 1     3
    

    2) H2 An easier alternative is to use the H2 database back end to sqldf. If the RH2 package is loaded sqldf will default to it. H2 does have true date and datetime types making manpulation of such objects much easier.

    library(RH2)
    library(sqldf)
    
    sqldf("select day_of_week(now)-1 dayno, dayname(now) dayname from dd")
    ##   dayno   dayname
    ## 1     3 Wednesday