Search code examples
rtransformationdplyrlogfile

Calculate difference between observation in dependence of value of another variable


I have an app record and want to calculate the time between two specific events.

My record structure looks like this:

    appdata <- data.frame(userid = c(1,1,1,1,1), dayid = c(32,32,32,32,32), activity = c("appstart","levelup","appclose","appstart","appclose"), datesec = c(2670,2726,2755,2787,4161))

    appdata
      userid dayid activity datesec
    1      1    32 appstart    2670
    2      1    32  levelup    2726
    3      1    32 appclose    2755
    4      1    32 appstart    2787
    5      1    32 appclose    4161

I want to know for one day how long the user was active. So I have to calculate the differences between each appstart and appclose and then build the sum, so here: (2755-2670) + (4161-2755) = 1459.

The new dataset should look like this:

    appdata2 <- data.frame(user = c(1), dayid = c(32), usagetime_in_sec = c(1491))

    appdata2
      user dayid usagetime_in_sec
     1    1    32             1459

Here is my basic approach, but I don't know how to tell R to always calculate the difference between an appstart and the next appclose event:

    apdata2 <- appdata %>% 
      group_by(userid, dayid) %>%
      summarise(usagetime_in_sec = sum(datsec(type == "appclose") - datesec(type == "appstart")))

Solution

  • You were very close. I think you need something like

    library(dplyr)
    
    appdata %>%
      group_by(userid, dayid) %>%
      summarise(usagetime_in_sec = sum(datesec[activity == "appclose"] - 
                                       datesec[activity == "appstart"]))
    
    
    #   userid dayid usagetime_in_sec
    #    <dbl> <dbl>            <dbl>
    #1      1    32             1459
    

    However, make sure you have equal number of "appclose" and "appstart" activity otherwise it might mess up the calculation.