I have a dataset on malfunction times. The start of the malfunction is the Begin
column and the end is the End
column.
In addition there are 3 malfunction categories, which can also overlap in time.
My actual goal is to calculate the sum of the time period per day when there is no malfunction. (The category actually does not matter, I do not need a breakdown of the category).
For better illustration, here is an example data set of the malfunction times over 3 days:
> df_time
# A tibble: 8 × 3
Category Begin End
<chr> <dttm> <dttm>
1 A 2023-07-15 01:40:11 2023-07-15 13:43:15
2 A 2023-07-16 05:54:44 2023-07-16 10:50:45
3 B 2023-08-16 07:43:09 2023-08-16 16:42:12
4 C 2023-08-16 12:00:00 2023-08-16 13:11:13
5 A 2023-08-16 18:00:00 2023-08-16 19:30:00
6 A 2023-08-17 08:00:00 2023-08-17 13:00:00
7 C 2023-08-17 11:12:45 2023-08-17 19:58:22
8 A 2023-08-17 19:01:45 2023-08-17 23:59:59
I have now tried to visualize the malfunction times graphically:
Now I want to calculate a summary per calendar day, which shows how many minutes (or hours or seconds) no single malfunction occurred.
This is the sum of all green time intervals per day:
If there would be only one malfunction category, one could simply calculate the 24 hours minus the malfunction interval.
However, I do not get further in dealing with the different overlapping malfunction intervals.
Can anyone help me in calculating the malfunction free periods and/or also the daily malfunction periods.
I have imagined such an evaluation to be somewhat: (Where the results are probably not exactly the same as the example data set above)
Date OK_times malfunction_times
<date> <chr> <chr>
1 2023-07-15 17 hours 7 hours
2 2023-07-16 5 Hours 19 hours
3 2023-07-17 3 hours 21 hours
Here is the code for creating the example data frame:
library(tidyverse)
df_time <- tibble(
Category = c("A", "A", "B", "C", "A", "A", "C", "A"),
Begin = as_datetime(c(
"2023-07-15 01:40:11",
"2023-07-16 05:54:44",
"2023-08-16 07:43:09",
"2023-08-16 12:00:00",
"2023-08-16 18:00:00",
"2023-08-17 08:00:00",
"2023-08-17 11:12:45",
"2023-08-17 19:01:45"
)),
End = as_datetime(c(
"2023-07-15 13:43:15",
"2023-07-16 10:50:45",
"2023-08-16 16:42:12",
"2023-08-16 13:11:13",
"2023-08-16 19:30:00",
"2023-08-17 13:00:00",
"2023-08-17 19:58:22",
"2023-08-17 23:59:59"
))
)
interval_intersects <- function(i1, i2) {
# check if there's an overlap between the two intervals
between(i1$Begin, i2$Begin, i2$End) | between(i1$End, i2$Begin, i2$End) | between(i2$Begin, i1$Begin, i1$End) | between(i2$End, i1$Begin, i1$End)
}
tidy_intervals <- function(df) {
out <- df[0,] # empty tibble with the columns of the input df
while (nrow(df) > 0) {
matched = FALSE
if (nrow(out) != 0) { # this is so janky but R will try to loop through the dataframe, even when it has no rows, even using seq_along
# for each row in out, check if it intersects with the current row
for (j in 1:nrow(out)) {
# if it does, update the current output row to be the minimum of the two begin times and the maximum of the two end times
if (interval_intersects(df[1, ], out[j, ])) {
matched = TRUE
out[j, ] <- tibble(
Begin = min(df[1,]$Begin, out[j,]$Begin),
End = max(df[1,]$End, out[j,]$End)
)
break
}
}
}
# if the current row didn't intersect with any of the output rows, append it to the output
if (!matched) {
out <- out |> add_row(
Begin = df[1,]$Begin,
End = df[1,]$End)
}
# remove the current row from the input
df <- df[-1, ]
}
return(out)
}
tidy_intervals(df_time |> select(-Category)) |>
##### everything from here until later is from the old answer https://stackoverflow.com/a/76905774/4145280 #####
mutate(b = as.Date(Begin), e = as.Date(End),
# create a sequence of dates between begin and end
days = map2(b, e, ~ seq.Date(.x, .y, by = "1 day"))) |>
# unnest the days column into many rows
unnest(days) |>
# if the beginning date is the same as the date in `days`, then use the original Begin column
# else, use `days` as a datetime
mutate(Begin = if_else(b == days, Begin, as_datetime(days)),
# same with End, but subtracting one minute
End = if_else(e == days, End, as_datetime(days) + days(1) - seconds(1)), .keep = "unused") |>
#### new stuff starts here ####
mutate(Date = as.Date(Begin),
malfunction_times = End - Begin) |>
reframe(malfunction_times = round(sum(malfunction_times)),
OK_times = 24 - malfunction_times, .by = Date)
Output:
# A tibble: 4 × 3
Date malfunction_times OK_times
<date> <drtn> <drtn>
1 2023-07-15 12 hours 12 hours
2 2023-07-16 5 hours 19 hours
3 2023-08-16 10 hours 14 hours
4 2023-08-17 16 hours 8 hours