I have a dataframe similar to the following (although with 80000 rows) where first column is "Date.Time" and the rest of columns are variables that have some values with NA. As an reprex example:
df <- data.frame(
Date= c("2020-01-01 09:50:00", "2020-01-01 09:51:30", "2020-01-01 09:53:00", "2020-01-01 09:54:00",
"2020-01-01 09:55:00", "2020-01-01 09:57:30", "2020-01-01 09:59:00", "2020-01-01 10:01:00"),
Variable1 = c(10,15,NA,25,22,10,11,NA),
Variable2 = c(1,NA,2,5,8,6,8,NA))
What I need is the maximum time interval between 2 rows without NA. On the previous example, the values I would need are for Variable1 and Date[7,1]-Date[4,1] (since Date[2,1]-Date[1,1] is a time interval smaller), while for Variable2 it would be Date[7,1]-Date [3,1]
I've been trying with rle() function, obtaining for each variable the intervals of NA and not NA:
is.na.rle222 <- rle(is.na(df[, "Variable1"]))
But I only obtain the size of the biggest interval without a link to dates.
Hope my question is clear.
Thanks in advance
Using the logic from @GKi with dplyr and trying to be more explicit:
require(dplyr)
(
df
%>% mutate(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable1))))),
Var2_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable2))))))
%>% group_by(Var1_interval_grp)
%>% mutate(Range_Var1 = diff(range(as.POSIXct(Date))))
%>% ungroup
%>% group_by(Var2_interval_grp)
%>% mutate(Range_Var2 = diff(range(as.POSIXct(Date))))
%>% ungroup
%>% select(! contains("grp"))
) -> df
The output is now:
> df
# A tibble: 8 x 5
Date Variable1 Variable2 Range_Var1 Range_Var2
<chr> <dbl> <dbl> <drtn> <drtn>
1 2020-01-01 09:50:00 10 1 90 secs 0 secs
2 2020-01-01 09:51:30 15 NA 90 secs 0 secs
3 2020-01-01 09:53:00 NA 2 0 secs 360 secs
4 2020-01-01 09:54:00 25 5 300 secs 360 secs
5 2020-01-01 09:55:00 22 8 300 secs 360 secs
6 2020-01-01 09:57:30 10 6 300 secs 360 secs
7 2020-01-01 09:59:00 11 8 300 secs 360 secs
8 2020-01-01 10:01:00 NA NA 0 secs 0 secs
And it is easy to get the dates where max is reached:
(
df
%>% filter(Range_Var1 == max(Range_Var1))
%>% pull(Date)
)
which produces:
[1] "2020-01-01 09:54:00" "2020-01-01 09:55:00" "2020-01-01 09:57:30"
[4] "2020-01-01 09:59:00"