Search code examples
rdplyrlag

Conditional lag function


Data based on the dataset from Kaggle here and extracted to R.

Using the following structure:

Index   VisitorId           VisitId     Visit# Hit# pagePath
0       000722514342430295  1470093727  1      1    /home
1       000722514342430295  1470093727  1      3    /google+redesign/apparel
2       000722514342430295  1470093727  1      4    /asearch.html
3       000722514342430295  1470093727  1      5    /asearch.html
4       0014659935183303341 1470037282  1      1    /home
5       0015694432801235877 1470043732  1      1    /home
6       0015694432801235877 1470043732  1      2    /google+redesign/electronics
7       0015694432801235877 1470043732  1      3    /google+redesign/apparel/men++s/men++s+t+shirts
8       0015694432801235877 1470043732  1      4    /google+redesign/apparel/kid+s/kid+s+infant
9       0015694432801235877 1470043732  1      5    /google+redesign/apparel/kid+s/kid+s+infant/quickview

I'm trying to implement a mutate lag function which will return the previous pagepath for a given visit by a given visitor.

For example, new column prev_path would be both visitorid and visitid specific and would lag Hit# by 1 but would return an <NA> when not available in the case of Visit 1, Hit 2.


Solution

  • Is this what you're trying to do?

    library(dplyr)
    
    df %>%
      group_by(VisitorId, VisitId) %>%
      mutate(prev_path = ifelse(lag(`Hit#`) == `Hit#` - 1, lag(pagePath), NA))