Search code examples
rdplyrtime-seriesplyrrstudio-server

How to count the number of unique labels belonging to one particular column with respect to timestamp by interval of x minutes?


My dataset is like this:

Dataset

Let me explain my data frame. I have two column named "timeStamp" and "label"."label" column contains the unique number of values occurred with respect to "timeStamp" column.

I am able to find the number of occurrences of unique variables in label column throughout the entire time using aggregate and count function available in R.

But now I want to count the number of occurrences of unique variables in label column with respect to timestamp by an interval of 2 minutes.

To be precise, this is what I am looking for in my output:

OUTPUT

You can find the data frame here using dput in R.

x <- data.frame(timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"), label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login"
))
dput(x)

Solution

  • Here is a tidyverse solution:

    # Create 2 min breakpoints by which we group times
    hm <- function(x) as.POSIXct(x, format = "%H:%M")
    breaks <- seq(min(hm(x$timeStamp)), max(hm(x$timeStamp)) + 120, by = '2 min');
    
    library(tidyverse);
    x %>%
        mutate(
            timeStamp = cut(hm(timeStamp), breaks = breaks)) %>%
        count(timeStamp, label) %>%
        spread(label, n)
    ## A tibble: 3 x 4
    #  timeStamp           `001_T09_Submit Pa… `002_T05_SearchPat… `003_T04_Ward Lo…
    #  <fct>                             <int>               <int>             <int>
    #1 2018-04-13 20:12:00                  NA                   2                 4
    #2 2018-04-13 20:14:00                   4                   4                 2
    #3 2018-04-13 20:16:00                  NA                   2                 2
    

    Explanation: We create 2 min breakpoints by which we cut the hour+minute component of timeStamp; then count by 2 min-grouped times and label, and spread from long to wide.


    Sample data

    x <- data.frame(
        timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"), 
        label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login" ))