Search code examples
rlubridatedifftime

Error "Origin must be supplied" when using difftime + subtraction operator not working for different minutes


I have a data frame of two columns "start" and "end" in HH:MM:SS format.

I wanted to calculate duration between start and end using difftime function

It always gives back this error: Error in as.POSIXct.numeric(time1) : 'origin' must be supplied

I read many posts but none seemed to work for me.

Loading packages

library(dplyr)
library(tidyverse)
library(lubridate)

I removed hours to deal with minutes and seconds only

get_time <- function(x){str_sub(x, start = -5) %>%  ms()} 
df <- df %>% mutate(start = get_time(start)) %>%
  mutate(end = get_time(end))

Class of objects

class(df$start)
gives: 
[1] "Period"
attr(,"package")
[1] "lubridate"
start                 end       

26M 22S               26M 23S        
26M 25S               26M 37S      
29M 47S               30M 13S

I calculated duration using difftime function

df$duration <- with(df, difftime(end, start, units="secs"))
gives error:
Error in as.POSIXct.numeric(time1) : 'origin' must be supplied

I used subtraction operator, it worked fine except for 3rd row when minutes are different, it gave wrong answer.

start                 end            duration

26M 22S               26M 23S        1S
26M 25S               26M 37S        12S
29M 47S               30M 13S        1M -34S

Amendment

The accepted response works perfectly fine, except that it returns an error: Error in mtx1[3, ] : incorrect number of dimensions whenever applied to the second two columns "start2" and "end2" that I have in the same data frame.

sample from my df

df <- structure(list(item = c("manatee", "manatee", "pile", "pile"), prestart = new("Period", .Data = c(22, 
25, 41, 49), year = c(0, 0, 0, 0), month = c(0, 
0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0, 
0), minute = c(26, 26, 26, 26)), preend = new("Period", 
    .Data = c(23, 37, 48, 50), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
    ), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 26)), poststart = new("Period", .Data = c(23, 41, 50, 
54), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0, 0), 
    minute = c(26, 26, 26, 26)), postend = new("Period", 
    .Data = c(37, 48, 52, 22), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
    ), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 27))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

Organising data in minutes and seconds only (remove hours)


get_time <- function(x){str_sub(x, start = -5) %>%  ms()} 
df <- df %>% mutate(prestart = get_time(prestart)) %>%
  mutate(preend = get_time(preend)) %>% 
  mutate(poststart = get_time(poststart)) %>% 
  mutate(postend = get_time(postend))



Solution

  • Update 2: I'm keeping both previous answers for the record (in case somebody really does have data that contains strings like this). However, the data is actually derived from lubridate, so "26M 22S" is merely a representation of a numeric object.

    Ultimately, it is as direct as:

    lubridate::as.difftime(df$preend - df$prestart, units="secs")
    # Time differences in secs
    # [1]  1 12  7  1
    

    Update: your data format is nothing like I originally inferred. I'll keep the original answer below, but given this data structure it is not much help.

    You can always try to do "modulus subtraction", but I think the best way to go is to convert to decimal and back. First, I'll provide data in two ways that make it incredibly easier for users to know exactly what your data looks like. (Having this up-front would have precluded me providing the original less-helpful answer.) Please use something like this in the future, it means a lot!

    x <- data.frame(
      start = c("26M 22S", "26M 25S", "29M 47S"),
      end = c("26M 23S", "26M 37S", "30M 13S"),
      stringsAsFactors = FALSE
    )
    
    # if you don't want to generate a frame like that, then you can
    # provide the output from dput(head(x))
    structure(list(start = c("26M 22S", "26M 25S", "29M 47S"), end = c("26M 23S", 
    "26M 37S", "30M 13S")), class = "data.frame", row.names = c(NA, 
    -3L))
    

    From here, two helper functions to convert to/from decimal minutes. These both make the assumption that you only ever deal with minutes/seconds, never more. Similarly, conversion back to character assumes that you are always using integral seconds, which is perhaps hasty. If this is not the case, you can remove the round and accept fractional components, perhaps using sprintf("%dM %02.3f", ...) instead, controlling the decimal component.

    decimal_minutes <- function(s) {
      nums <- strsplit(gsub("[^0-9 ]", "", s), "\\s+")
      mtx <- sapply(nums, as.integer)
      mtx[1,] + mtx[2,] / 60
    }
    minutes_seconds <- function(num, keep0 = TRUE) {
      out <- sprintf("%dM %02dS", as.integer(num), as.integer(round(60 * (num %% 1), 0)))
      if (!keep0) out <- gsub("^0M ", "", out)
      out
    }
    

    From here, you can always preserve the numeric version if you want to use them elsewhere:

    x[,c("startnum", "endnum")] <- lapply(x[,c("start", "end")], decimal_minutes)
    x
    #     start     end startnum   endnum
    # 1 26M 22S 26M 23S 26.36667 26.38333
    # 2 26M 25S 26M 37S 26.41667 26.61667
    # 3 29M 47S 30M 13S 29.78333 30.21667
    x$endnum - x$startnum
    # [1] 0.01666667 0.20000000 0.43333333
    minutes_seconds(x$endnum - x$startnum)
    # [1] "0M 01S" "0M 12S" "0M 26S"
    minutes_seconds(x$endnum - x$startnum, keep0 = FALSE)
    # [1] "01S" "12S" "26S"
    

    But if all you want is the one-time subtraction, you can wrap it up in one call:

    x$duration <- minutes_seconds(
      decimal_minutes(x$end) - decimal_minutes(x$start),
      keep0 = TRUE
    )
    x
    #     start     end duration
    # 1 26M 22S 26M 23S   0M 01S
    # 2 26M 25S 26M 37S   0M 12S
    # 3 29M 47S 30M 13S   0M 26S
    x$duration <- minutes_seconds(
      decimal_minutes(x$end) - decimal_minutes(x$start),
      keep0 = FALSE
    )
    x
    #     start     end duration
    # 1 26M 22S 26M 23S      01S
    # 2 26M 25S 26M 37S      12S
    # 3 29M 47S 30M 13S      26S
    

    Ideally, this can and should be generalized to accept more (such as hours, as in "1H 23M 11S"). An easy step would be to update decimal_minutes to look for and deal with longer formats. I wonder if lubridate would be a good fit for you, though I doubt it'll take "26M 22S" as a native format, so you'll still need to do some data mangling to start using it.


    Discussions about origin= with regards to R's POSIXt means that it is likely converting from a number to a time/date. A common reason to do this is when using epoch seconds (common in unix-y stuff) as a numeric depiction of a timestamp. It's common to assume that the "0 seconds" of this format (1970-01-01 00:00:00) is absolute, but it is not universal (excel is different), and it is feasible and even realistic/desirable to have a different "0" time. So it assumes nothing, forcing you to be explicit.

    as.POSIXct(100, origin="1970-01-01 00:00:00")
    # [1] "1969-12-31 16:01:40 PST"
    as.POSIXct(100, origin="1970-01-01 00:00:00", tz="UTC")
    # [1] "1970-01-01 00:01:40 UTC"
    ### or even just 
    as.POSIXct(100, origin="1970-01-01")
    

    So to use difftime on numbers, you first need to convert those numbers with something like as.POSIXct(..., origin="1970-01-01") before doing difftime.

    However, since you want seconds, and numeric epoch is already in seconds, you could just do

    end - start
    

    If you really need it to be tagged as "seconds", then do

    `units<-`(end - start, "secs")
    ### such as
    `units<-`(100-90, "secs")
    # [1] 10
    # attr(,"units")
    # [1] "secs"