Search code examples
sqlrintervalsdata-conversion

Converting SQL-Interval-String to minutes using R


I am working with R where I have a variable '2 month 3 day 6 hour 70 minute' as string. The variable changes over time and therefore does not have the same length/structure. I need this variable to do a query on a PostgreSQL database by casting it to an interval. This works just fine.

Now I need this interval/string-variable as integer in minutes to do some mathematical calculations.

I thought of using sqldf the following:

library(sqldf)
my_interval = '2 month 3 day 6 hour 70 minute'
interval_minutes <- sqldf(paste("SELECT EXTRACT(EPOCH FROM '",my_interval,"'::INTERVAL)/60"))
interval_minutes_novar <- sqldf("SELECT EXTRACT(EPOCH FROM '2 month 3 day 6 hour 70 minute'::INTERVAL)/60")

but am getting Error: near "FROM": syntax error. From my research I know that sqldf uses SQLite, which does not support EXTRACT().

How can I convert a SQL-Interval to minutes using R?


Solution

  • 1) sqldf/gsubfn Using gsubfn replace each word in my_interval with *, the appropriate number of minutes and + . Remove any trailing + and spaces and then either parse and evaluate mins or substitute mins into the sql statement. There are 365.25 / 12 days in the average month over 4 calendar years, having one leap year, but if you want to get the same answer as PostgreSQL replace 365.25 / 12 with 30, as noted in the comments.

    library(sqldf)  # this also pulls in gsubfn
    
    # input
    my_interval = '2 month 3 day 6 hour 70 minute'
    
    L <- list(minute = " +", hour = "*60 +", day = "*60*24 +", 
           month = "*365.25 * 60 * 24 /12 +")
    mins <- my_interval |>
      gsubfn(pattern = "\\w+", replacement = L) |> 
      trimws(whitespace = "[+ ]")
    
    eval(parse(text = mins))
    ## [1] 92410
    
    fn$sqldf("select $mins mins")
    ##    mins
    ## 1 92410
    

    2) Base R This is a base R solution. Extract the numbers and words into separate vectors, translate the words to the appropriate factors and take their inner product. The discussion about 30 days months in (1) applies here too.

    v <- c(minute = 1, hour = 60, day = 60 * 24, month = 365.25 * 60 * 24 /12)
    nums <- my_interval |>
      gsub(pattern = "[a-z]", replacement = "") |>
      textConnection() |>
      scan(quiet = TRUE)
    words <- my_interval |>
      gsub(pattern = "\\d", replacement = "") |>
      textConnection() |>
      scan(what = "", quiet = TRUE)
    sum(v[words] * nums)
    ## [1] 92410
    

    3) lubridate lubridate duration objects can be used.

    library(lubridate)
    as.numeric(duration(my_interval), "minute")
    ## [1] 92410
    

    Although lubridate does not handle 30 day months (and Hadley says it is not planned) we can preprocess my_interval to get the effect.

    library(gsubfn)
    library(lubridate)
    
    my_interval |>
     gsubfn(pattern = "(\\d+) +month", replacement = ~paste(30*as.numeric(x),"day")) |>
     duration() |>
     as.numeric("minute")
    ## [1] 91150