I have a data set (named desktop) that contains chronologically ordered information from a web tracker that contains the URLs visited by different users in one column and the user ID in another column. With the goal of search engine analysis I'm trying to filter all the rows that contain an URL where a user submitted a search query to google which I'm able to do with the following line of code:
data_google <- dplyr::filter(desktop, grepl('\\bgoogle.com/search\\b', desktop$url, ignore.case = T))
This works fine. However, I'm interested not only in the URL that contains the search query but also the web page the user visited after submitting the query. In other words, the link from the google result page the user actually clicked on.
Is it possible to filter not only the row where an url matches the pattern but also the the row right after that one?
Any help would be appreciated, thank you
Using the iris dataset as an example. I am going to putt all species that start with 'set' and then get the row after it. This is a pretty simple example but in your case should accomplish your goal.
vec1 <- which(grepl("set", iris$Species))
vec2 <- vec1+1
vec3 <- unique(c(vec1,vec2))
iris[vec3,]
EDIT if you need it within group the solution below should work. Using the diamonds dataset I sort to mimic your order then group by cut and find where color contains 'E' then you can use lag
on the first flag variable to get the row after it and it respects the group_by()
diamonds2 <- diamonds %>%
arrange(cut) %>%
group_by(cut) %>%
mutate(
fl = ifelse(rownm %in% which(grepl("E",color)),1,0 ),
fl2 = lag(fl)) %>%
filter(fl ==1 | fl2 ==1
)