Search code examples
rdataframedata-manipulation

Count number of unique activities


I have a dataframe (simplified version) as follows:

df <- data.frame(ID = rep('A',7), Activity = 
 c('Login','Login','cat1','Login','cat2','Login','Login'))

  ID Activity
1  A    Login
2  A    Login
3  A     cat1
4  A    Login
5  A     cat2
6  A    Login
7  A    Login

Here is what I wish to do:

  • start from the first row

  • initiate session=0

  • create a dataframe to hold path and count

  • If the Activity is equal to Login, then session=1, check the next row's Activity and record it. This will be a path until the next Login

  • continue until you hit the next Login, then set session=2.

  • The final outcome for this example would be:

           Path          count
           Login           3
           Login, cat1     1
           Login, cat12    1
    

Solution

  • Create groups based on "Login", split, then paste per group, finally, aggregate using table:

    data.frame(
      table(
        sapply(split(df$Activity, cumsum(df$Activity == "Login")), function(i){
          paste(i, collapse = ",")
        })))
    #         Var1 Freq
    # 1      Login    3
    # 2 Login,cat1    1
    # 3 Login,cat2    1