Search code examples
rfor-loophistogramboolean-logic

Calculating the time length of each binary/boolean column as reference


I have two columns. One is listed as True/False for a series of data. The entire dataset also has a timestep column. I want to write code that can read when the Boolean column changes to true, the time is calculated from the timestamp column until the Boolean changes back to false. And repeat this for the entire series, and bin the times in a data frame for a histogram. Apologies for the poor attempt, I really don't know where to start. Note that the running column is listed as characters -- perhaps I need to convert to Boolean for this to work?

running  <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)

datafinal <- data.frame()    
for (i in dataset){
   if running == f,
   result <- sum(i:n)
datafinal <- c(datafinal, result)
}

Solution

  • Converting running column to boolean and working with for-loop is a way. Also, you can operate in the dataframe. You already have one! This is a solution working with tidyverse library and some date operations thanks to lubridate library. I encourage you to learn to work with these libraries for this kind of problem.

    rleid() function from data.table library add +1 everytime a value in the target column running changes.

    running  <- c("t","t","f","f","t","f","t","t")
    time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
    dataset <- data.frame(time, running)
    
    # times to date time object
    dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
    
    library(tidyverse)
    solution = dataset %>% 
      mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
      group_by(Grp) %>% # rows in the same state are grouped together
      slice(1) %>% # keep first row
      ungroup %>%  # you don't need grouping anymore
      mutate(timeLength = difftime(time, lag(time), units="secs")) 
      # calculate the differences between a row and previous one (lag(n=1))
    

    Output:

    # A tibble: 5 x 4
      time                running   Grp timeLength
      <dttm>              <chr>   <int> <drtn>    
    1 2022-01-01 00:00:10 t           1 NA secs   
    2 2022-01-01 00:00:30 f           2 20 secs   
    3 2022-01-01 00:00:50 t           3 20 secs   
    4 2022-01-01 00:01:00 f           4 10 secs   
    5 2022-01-01 00:01:10 t           5 10 secs   
    

    If you want to get rid of the first NA row, just add to the pipeline %>% filter(!is.na(timeLength)).

    Update to add how you could do it with for-loop and nested if-else. But note the code is longer and more difficult to track.

    dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
    # empty array for tracking changes in rows
    current = c()
    # datafinal  empty dataframe
    datafinal  = data.frame()
    # better working with the rows index
    for (i in seq(nrow(dataset))){
      # extract current vale of running
      current = c(current,dataset[i,]$running)
      if (i>1){ # we can't operate with first row, right?
        if (current[i] == current[i-1]){
          next # pass iteration if they keep in same state (true or false)
        }
        else {  # different state? let's operate
          result = difftime(dataset[i,]$time, previous_time, units="secs")
        }
        # (note: if 'next' jump in if-loop this part doesn't jump)
        
        # create the outcome row for iteration
        new_row = cbind(dataset[i,],result)
        # add row to final dataframe
        datafinal = rbind(datafinal,new_row)
      }
      # keep first time of state when it changes or we initiate the loop
      previous_time = dataset[i,]$time 
    }