I have a dataset, df:
Ultimately I would like to be able to group the data into 'chunks' where the Folder column contains the string 'Out', making sure to consider the DATE and the empty Message values that it is associated with. Is there a way to create a chunk for each instance that 'Out' and an empty Message row occurs, while computing its duration.
Folder DATE Message
Outdata 9/9/2019 5:46:00
Outdata 9/9/2019 5:46:01
Outdata 9/9/2019 5:46:02
In 9/9/2019 5:46:03 hello
In 9/9/2019 5:46:04 hello
Outdata 9/10/2019 6:00:01
Outdata 9/10/2019 6:00:02
In 9/11/2019 7:50:00 hello
In 9/11/2019 7:50:01 hello
I would like this output:
New Variable Duration Message
Outdata1 2 sec
Outdata2 1 sec
I have included the dput:
dput(sample)
structure(list(Folder = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L), .Label = c("In", "Outdata"), class = "factor"), Date = structure(c(5L,
6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L), .Label = c("9/10/2019 6:00:01 AM",
"9/10/2019 6:00:02 AM", "9/11/2019 7:50:00 AM", "9/11/2019 7:50:01 AM",
"9/9/2019 5:46:00 AM", "9/9/2019 5:46:01 AM", "9/9/2019 5:46:02 AM",
"9/9/2019 5:46:03 AM", "9/9/2019 5:46:04 AM"), class = "factor"),
Message = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("",
"hello"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
This is what I have tried, which works well, I just need to account for the condition of the empty Message values as well.
library(dplyr)
df %>%
mutate(DATE = as.POSIXct(DATE, format = "%m/%d/%Y %I:%M:%S %p"),
gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
filter(Folder == "Out") %>%
arrange(gr, DATE) %>%
group_by(gr) %>%
summarise(Duration = difftime(last(DATE), first(DATE), units = "secs")) %>%
mutate(gr = paste0('Out', row_number()))
The above code works fine, but I am not sure how to satisfy the condition where row == ""
Probably, just paste
the Message
together in one string.
library(dplyr)
sample %>%
mutate(DATE = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p"),
gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
filter(Folder == "Outdata") %>%
arrange(gr, DATE) %>%
group_by(gr) %>%
summarise(Duration = difftime(last(DATE), first(DATE), units = "secs"),
Message = paste0(Message, collapse = "")) %>%
mutate(gr = paste0('Out', row_number()))