Search code examples
rtidyverseintervals

Summarizing within Conditions Repeated over Time


I am trying to summarize data within time intervals using a data set with conditions repeated over time at varying intervals. I would like to get means and standard deviations within time intervals for each of the conditions.

However, in my real data I don't know how many intervals of each condition there will be. I thought perhaps I could indicate the end of an interval by a change in Condition from one row to the next row. But I don't know how to code that.

library(tidyverse)

df <- data.frame(Condition = c(rep("A", 50), 
                               rep("B", 60), 
                               rep("C", 50),
                               rep("A", 60), 
                               rep("B", 50), 
                               rep("C", 50)),
                 Time = c(seq(160, 190, length.out = 50), 
                          seq(190.05, 230, length.out = 60), 
                          seq(230.05, 260, length.out = 50),
                          seq(260.05, 293, length.out = 60), 
                          seq(293.05, 321, length.out = 50), 
                          seq(321.05, 352, length.out = 50))
) %>%
        rowwise() %>%
        mutate(X = rnorm(1.4, 0.3))

I'm trying to calculate mean(X) and sd(X) for each interval of Condition (made up numbers):

Condition   interval        mean(X)   sd(X)
A            [160,190]       1.4      0.32
B            [190.05,230]    1.46     0.36
C            [230.05,260]    1.32     0.26
A            [260.05,293]    1.5      0.40
B            [293.05,321]    1.25     0.34
C            [321.05,352]    1.43     0.41

I've tried this, but it doesn't do what I need:

df %>%  
        group_by(Condition) %>%
        mutate(interval = cut(Time,
                              breaks = c(floor(min(Time)), ceiling(max(Time))),
                              include.lowest = F, 
                              right = F)) %>%
        group_by(Condition, interval) %>% 
        summarise( mean.X = mean(X),
                   sd.X = sd(X))

This doesn't give me the second intervals for each Condition:

  Condition interval  mean.X   sd.X
  <chr>     <fct>      <dbl>  <dbl>
1 A         [160,293)  0.231  0.991
2 A         NA         1.61  NA    
3 B         [190,321)  0.421  0.893
4 B         NA         0.249 NA    
5 C         [230,352)  0.193  0.898
6 C         NA         0.427 NA   

Any suggestions?


Solution

  • We can use rle to define "groups" of your Condition.

    library(dplyr)
    
    df %>% 
      ungroup() %>% 
      mutate(group = rep(1:length(rle(Condition)$lengths), rle(Condition)$lengths)) %>% 
      group_by(group) %>% 
      summarize(Condition = unique(Condition),
                interval = paste0("[", range(Time)[1], ",", range(Time)[2], "]"), 
                mean_X = mean(X), 
                sd_X = sd(X))
    
    # A tibble: 6 × 5
      group Condition interval     mean_X  sd_X
      <int> <chr>     <chr>         <dbl> <dbl>
    1     1 A         [160,190]    0.160  0.926
    2     2 B         [190.05,230] 0.0258 0.990
    3     3 C         [230.05,260] 0.296  1.03 
    4     4 A         [260.05,293] 0.472  1.08 
    5     5 B         [293.05,321] 0.0363 1.08 
    6     6 C         [321.05,352] 0.361  1.10