I have a data frame with thousands of rows but a sample is given below:
userid event
1 123 view
2 123 view
3 123 order
4 345 view
5 345 view
6 345 view
7 345 order
8 111 view
9 111 order
10 111 view
11 111 view
12 111 view
13 333 view
14 333 view
15 333 view
dput(data)
structure(list(userid = c(123, 123, 123, 345, 345, 345, 345,
111, 111, 111, 111, 111, 333, 333, 333), eventaction = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("order",
"view"), class = "factor")), .Names = c("userid", "event"
), row.names = c(NA, -15L), class = "data.frame")
What I am looking at doing is to extract all the rows of userid's that contains the word "order" under event. The result will contain all the rows for userid's excluding userid = 333 as the eventaction does not contain an order entry.
The second task is to count the number of occurences for "view" before the order entry. I will appreciate help and pointers.
Thanks.
We can try with data.table
. Convert the 'data.frame' to 'data.table' (setDT(data)
), grouped by 'userid', if
there are any
'event' that is an 'order' in the 'userid', return the Subset of Data.table' (.SD
)
library(data.table)
setDT(data)[,if(any(event=="order")) .SD , by = userid]
Or using dplyr
, we filter
for any
'order' in the 'event' after grouping by 'userid'.
library(dplyr)
data %>%
group_by(userid) %>%
filter(any(event == "order"))