I'm an R noob, and trying to perform a summary on a dataset which totals the number of event types for each ID that occurred between events of type 'B' for that ID. Here is a sample to illustrate:
id <- c('1', '1', '1', '2', '2', '2', '3', '3', '3', '3')
type <- c('A', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C', 'B')
datestamp <- as.Date(c('2016-06-20','2016-07-16','2016-08-14','2016-07-17'
,'2016-07-18','2016-07-19','2016-07-16','2016-07-19'
, '2016-07-21','2016-08-20'))
df <- data.frame(id, type, datestamp)
which produces:
> df
id type datestamp
1 1 A 2016-06-20
2 1 A 2016-07-16
3 1 B 2016-08-14
4 2 A 2016-07-17
5 2 B 2016-07-18
6 2 C 2016-07-19
7 3 A 2016-07-16
8 3 B 2016-07-19
9 3 C 2016-07-21
10 3 B 2016-08-20
Any time an event 'B' occurs, I want to know the number of each event type that occurred before that B event, but after any other B events for that ID. What I would like to end up with is a table like this:
id type B_instance count
1 1 A 1 2
2 2 A 1 1
3 3 A 1 1
4 3 C 2 1
In researching, this question came the closest: summarizing a field based on the value of another field in dplyr
I've been trying to make this work:
df2 <- df %>%
group_by(id, type) %>%
summarize(count = count(id[which(datestamp < datestamp[type =='B'])])) %>%
filter(type != 'B')
But it errors out (also, even if it worked, it doesn't account for 2 'B' events in the same ID, like with id=3)
You can use cumsum
to create the new group variable B_instance
by doing cumsum(type == "B")
and then filter out types that fall behind the last B as well as type B itself, as they will not be counted. Then use count
to count the occurrence with the group by id
, B_instance
and type
.
df %>%
group_by(id) %>%
# create B_instance using cumsum on the type == "B" condition
mutate(B_instance = cumsum(type == "B") + 1) %>%
# filter out rows with type behind the last B and all B types
filter(B_instance < max(B_instance), type != "B") %>%
# count the occurrences of type grouped by id and B_instance
count(id, type, B_instance)
# Source: local data frame [4 x 4]
# Groups: id, type [?]
# id type B_instance n
# <fctr> <fctr> <dbl> <int>
# 1 1 A 1 2
# 2 2 A 1 1
# 3 3 A 1 1
# 4 3 C 2 1