I am running into a possible aggregate/summarise problem with this time series dataset.
Below is a dataset containing unique instances of a chat from a internet chatting system.
chatId agentId beginning_timestamp ending_timestamp answer
1 22 4/07/2016 9:00 4/07/2016 9:30 1
2 22 4/07/2016 9:26 4/07/2016 9:35 5
3 22 4/07/2016 9:15 4/07/2016 9:19 5
4 10 4/07/2016 11:17 4/07/2016 12:13 2
5 10 4/07/2016 11:29 4/07/2016 11:50 1
I would like to use dplyr (or anything else) to summarise the following dataset in this manner:
Here is a resulting dataset with an additional column(multiple_chats = a discrete variable will do) that I would like to make:
chatId agentId beginning_timestamp ending_timestamp answer multiple_chats
1 22 4/07/2016 9:00 4/07/2016 9:30 1 1
2 22 4/07/2016 9:26 4/07/2016 9:35 5 0
3 22 4/07/2016 9:15 4/07/2016 9:19 5 1
4 10 4/07/2016 11:17 4/07/2016 12:13 2 1
5 10 4/07/2016 11:29 4/07/2016 11:50 1 1
Any answers are appreciated.
Looks like some of your overlaps may be different than what you show in your "resulting dataset", but here is a stab at it with some input from here
df <- data.frame(chatID = 1:5, agentID = c(22,22,22,10,10),
beginning_timestamp = c('4/07/2016 9:00', '4/07/2016 9:26', '4/07/2016 9:15', '4/07/2016 11:17', '4/07/2016 11:29'),
ending_timestamp = c('4/07/2016 9:30', '4/07/2016 9:35', '4/07/2016 9:19', '4/07/2016 12:13', '4/07/2016 11:50'),
answer = c(1,5,5,2,1))
l
ibrary(tidyverse)
library(lubridate)
df %>%
mutate(beginning_timestamp = mdy_hm(beginning_timestamp),
ending_timestamp = mdy_hm(ending_timestamp),
int = interval(beginning_timestamp + minutes(5),
ending_timestamp - minutes(5))) -> df
df$multiple_chats = unlist(tapply(df$int, df$agentID,
function(x) as.numeric(rowSums(outer(x, x, int_overlaps))>1)))