I have a data frame of two columns "start" and "end" in HH:MM:SS format.
I wanted to calculate duration between start and end using difftime function
It always gives back this error: Error in as.POSIXct.numeric(time1) : 'origin' must be supplied
I read many posts but none seemed to work for me.
Loading packages
library(dplyr)
library(tidyverse)
library(lubridate)
I removed hours to deal with minutes and seconds only
get_time <- function(x){str_sub(x, start = -5) %>% ms()}
df <- df %>% mutate(start = get_time(start)) %>%
mutate(end = get_time(end))
Class of objects
class(df$start)
gives:
[1] "Period"
attr(,"package")
[1] "lubridate"
start end
26M 22S 26M 23S
26M 25S 26M 37S
29M 47S 30M 13S
I calculated duration using difftime function
df$duration <- with(df, difftime(end, start, units="secs"))
gives error:
Error in as.POSIXct.numeric(time1) : 'origin' must be supplied
I used subtraction operator, it worked fine except for 3rd row when minutes are different, it gave wrong answer.
start end duration
26M 22S 26M 23S 1S
26M 25S 26M 37S 12S
29M 47S 30M 13S 1M -34S
Amendment
The accepted response works perfectly fine, except that it returns an error: Error in mtx1[3, ] : incorrect number of dimensions whenever applied to the second two columns "start2" and "end2" that I have in the same data frame.
sample from my df
df <- structure(list(item = c("manatee", "manatee", "pile", "pile"), prestart = new("Period", .Data = c(22,
25, 41, 49), year = c(0, 0, 0, 0), month = c(0,
0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0,
0), minute = c(26, 26, 26, 26)), preend = new("Period",
.Data = c(23, 37, 48, 50), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 26)), poststart = new("Period", .Data = c(23, 41, 50,
54), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0, 0),
minute = c(26, 26, 26, 26)), postend = new("Period",
.Data = c(37, 48, 52, 22), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 27))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
Organising data in minutes and seconds only (remove hours)
get_time <- function(x){str_sub(x, start = -5) %>% ms()}
df <- df %>% mutate(prestart = get_time(prestart)) %>%
mutate(preend = get_time(preend)) %>%
mutate(poststart = get_time(poststart)) %>%
mutate(postend = get_time(postend))
Update 2: I'm keeping both previous answers for the record (in case somebody really does have data that contains strings like this). However, the data is actually derived from lubridate
, so "26M 22S"
is merely a representation of a numeric
object.
Ultimately, it is as direct as:
lubridate::as.difftime(df$preend - df$prestart, units="secs")
# Time differences in secs
# [1] 1 12 7 1
Update: your data format is nothing like I originally inferred. I'll keep the original answer below, but given this data structure it is not much help.
You can always try to do "modulus subtraction", but I think the best way to go is to convert to decimal and back. First, I'll provide data in two ways that make it incredibly easier for users to know exactly what your data looks like. (Having this up-front would have precluded me providing the original less-helpful answer.) Please use something like this in the future, it means a lot!
x <- data.frame(
start = c("26M 22S", "26M 25S", "29M 47S"),
end = c("26M 23S", "26M 37S", "30M 13S"),
stringsAsFactors = FALSE
)
# if you don't want to generate a frame like that, then you can
# provide the output from dput(head(x))
structure(list(start = c("26M 22S", "26M 25S", "29M 47S"), end = c("26M 23S",
"26M 37S", "30M 13S")), class = "data.frame", row.names = c(NA,
-3L))
From here, two helper functions to convert to/from decimal minutes. These both make the assumption that you only ever deal with minutes/seconds, never more. Similarly, conversion back to character
assumes that you are always using integral seconds, which is perhaps hasty. If this is not the case, you can remove the round
and accept fractional components, perhaps using sprintf("%dM %02.3f", ...)
instead, controlling the decimal component.
decimal_minutes <- function(s) {
nums <- strsplit(gsub("[^0-9 ]", "", s), "\\s+")
mtx <- sapply(nums, as.integer)
mtx[1,] + mtx[2,] / 60
}
minutes_seconds <- function(num, keep0 = TRUE) {
out <- sprintf("%dM %02dS", as.integer(num), as.integer(round(60 * (num %% 1), 0)))
if (!keep0) out <- gsub("^0M ", "", out)
out
}
From here, you can always preserve the numeric version if you want to use them elsewhere:
x[,c("startnum", "endnum")] <- lapply(x[,c("start", "end")], decimal_minutes)
x
# start end startnum endnum
# 1 26M 22S 26M 23S 26.36667 26.38333
# 2 26M 25S 26M 37S 26.41667 26.61667
# 3 29M 47S 30M 13S 29.78333 30.21667
x$endnum - x$startnum
# [1] 0.01666667 0.20000000 0.43333333
minutes_seconds(x$endnum - x$startnum)
# [1] "0M 01S" "0M 12S" "0M 26S"
minutes_seconds(x$endnum - x$startnum, keep0 = FALSE)
# [1] "01S" "12S" "26S"
But if all you want is the one-time subtraction, you can wrap it up in one call:
x$duration <- minutes_seconds(
decimal_minutes(x$end) - decimal_minutes(x$start),
keep0 = TRUE
)
x
# start end duration
# 1 26M 22S 26M 23S 0M 01S
# 2 26M 25S 26M 37S 0M 12S
# 3 29M 47S 30M 13S 0M 26S
x$duration <- minutes_seconds(
decimal_minutes(x$end) - decimal_minutes(x$start),
keep0 = FALSE
)
x
# start end duration
# 1 26M 22S 26M 23S 01S
# 2 26M 25S 26M 37S 12S
# 3 29M 47S 30M 13S 26S
Ideally, this can and should be generalized to accept more (such as hours, as in "1H 23M 11S"
). An easy step would be to update decimal_minutes
to look for and deal with longer formats. I wonder if lubridate
would be a good fit for you, though I doubt it'll take "26M 22S"
as a native format, so you'll still need to do some data mangling to start using it.
Discussions about origin=
with regards to R's POSIXt
means that it is likely converting from a number to a time/date. A common reason to do this is when using epoch seconds (common in unix-y stuff) as a numeric depiction of a timestamp. It's common to assume that the "0 seconds" of this format (1970-01-01 00:00:00
) is absolute, but it is not universal (excel is different), and it is feasible and even realistic/desirable to have a different "0" time. So it assumes nothing, forcing you to be explicit.
as.POSIXct(100, origin="1970-01-01 00:00:00")
# [1] "1969-12-31 16:01:40 PST"
as.POSIXct(100, origin="1970-01-01 00:00:00", tz="UTC")
# [1] "1970-01-01 00:01:40 UTC"
### or even just
as.POSIXct(100, origin="1970-01-01")
So to use difftime
on numbers, you first need to convert those numbers with something like as.POSIXct(..., origin="1970-01-01")
before doing difftime
.
However, since you want seconds, and numeric epoch is already in seconds, you could just do
end - start
If you really need it to be tagged as "seconds", then do
`units<-`(end - start, "secs")
### such as
`units<-`(100-90, "secs")
# [1] 10
# attr(,"units")
# [1] "secs"