Search code examples
rdatetimemarkov-chains

check whether event occurred in 30-second intervals


I have the data set with event ID and timestamp when this event happened. For example at 9/2/2019 17:06. I want to build Markov chain model with two states noevent and event. To avoid building continuous time Markov chain, I want to split the period by 30 second and checking if in those 30 seconds event happened or not. Maybe someone could help me how to do it in R? Thank you!

I only prepared the date format and calculated the time between two events as well how many no events happened between two events.

data$timestamp <- as.POSIXct(data$timestamp,format="%m/%d/%Y %H:%M:%S")

nrow <- nrow(data)
for (i in 2:nrow) {
data$diff[i] <- difftime(data$timestamp[i], data$timestamp[i-1], units="secs")}
data$NUm <-round(data$diff/30)

Solution

  • tidyverse solution

    Use lubridate::floor_date() to round to 30-second intervals and tidyr::complete() to fill in intervals with no events:

    library(dplyr)
    library(tidyr)
    library(lubridate)
    
    data %>%
      mutate(timestamp = floor_date(timestamp, "30 seconds")) %>%
      complete(timestamp = full_seq(timestamp, 30)) %>%
      mutate(
        event = ifelse(!is.na(id), "yes", "no"),
        .keep = "unused"
      )
    
    # A tibble: 8 × 2
      timestamp           event
      <dttm>              <chr>
    1 2023-02-19 10:01:00 yes  
    2 2023-02-19 10:01:30 no   
    3 2023-02-19 10:02:00 yes  
    4 2023-02-19 10:02:30 no   
    5 2023-02-19 10:03:00 no   
    6 2023-02-19 10:03:30 no   
    7 2023-02-19 10:04:00 no   
    8 2023-02-19 10:04:30 yes
    

    Base R solution

    Similar logic as above, using base functions:

    times <- as.POSIXlt(data$timestamp)
    times$sec <- ifelse(times$sec < 30, 0, 30)
    intervals <- seq(min(times), max(times), by = 30)
    data.frame(
      intervals,
      event = ifelse(intervals %in% as.POSIXct(times), "yes", "no")
    )
    
                intervals event
    1 2023-02-19 10:01:00   yes
    2 2023-02-19 10:01:30    no
    3 2023-02-19 10:02:00   yes
    4 2023-02-19 10:02:30    no
    5 2023-02-19 10:03:00    no
    6 2023-02-19 10:03:30    no
    7 2023-02-19 10:04:00    no
    8 2023-02-19 10:04:30   yes
    

    Example data

    In the future, it’s best if you include example data in your question. See How to make a great R reproducible example. For these solutions, I used:

    data <- data.frame(
      id = 1:3,
      timestamp = as.POSIXct(c(
        "2023-02-19 10:01:23",
        "2023-02-19 10:02:01",
        "2023-02-19 10:04:45"
      ))
    )