I am trying to summarize data within time intervals using a data set with conditions repeated over time at varying intervals. I would like to get means and standard deviations within time intervals for each of the conditions.
However, in my real data I don't know how many intervals of each condition there will be. I thought perhaps I could indicate the end of an interval by a change in Condition from one row to the next row. But I don't know how to code that.
library(tidyverse)
df <- data.frame(Condition = c(rep("A", 50),
rep("B", 60),
rep("C", 50),
rep("A", 60),
rep("B", 50),
rep("C", 50)),
Time = c(seq(160, 190, length.out = 50),
seq(190.05, 230, length.out = 60),
seq(230.05, 260, length.out = 50),
seq(260.05, 293, length.out = 60),
seq(293.05, 321, length.out = 50),
seq(321.05, 352, length.out = 50))
) %>%
rowwise() %>%
mutate(X = rnorm(1.4, 0.3))
I'm trying to calculate mean(X) and sd(X) for each interval of Condition (made up numbers):
Condition interval mean(X) sd(X)
A [160,190] 1.4 0.32
B [190.05,230] 1.46 0.36
C [230.05,260] 1.32 0.26
A [260.05,293] 1.5 0.40
B [293.05,321] 1.25 0.34
C [321.05,352] 1.43 0.41
I've tried this, but it doesn't do what I need:
df %>%
group_by(Condition) %>%
mutate(interval = cut(Time,
breaks = c(floor(min(Time)), ceiling(max(Time))),
include.lowest = F,
right = F)) %>%
group_by(Condition, interval) %>%
summarise( mean.X = mean(X),
sd.X = sd(X))
This doesn't give me the second intervals for each Condition:
Condition interval mean.X sd.X
<chr> <fct> <dbl> <dbl>
1 A [160,293) 0.231 0.991
2 A NA 1.61 NA
3 B [190,321) 0.421 0.893
4 B NA 0.249 NA
5 C [230,352) 0.193 0.898
6 C NA 0.427 NA
Any suggestions?
We can use rle
to define "groups" of your Condition.
library(dplyr)
df %>%
ungroup() %>%
mutate(group = rep(1:length(rle(Condition)$lengths), rle(Condition)$lengths)) %>%
group_by(group) %>%
summarize(Condition = unique(Condition),
interval = paste0("[", range(Time)[1], ",", range(Time)[2], "]"),
mean_X = mean(X),
sd_X = sd(X))
# A tibble: 6 × 5
group Condition interval mean_X sd_X
<int> <chr> <chr> <dbl> <dbl>
1 1 A [160,190] 0.160 0.926
2 2 B [190.05,230] 0.0258 0.990
3 3 C [230.05,260] 0.296 1.03
4 4 A [260.05,293] 0.472 1.08
5 5 B [293.05,321] 0.0363 1.08
6 6 C [321.05,352] 0.361 1.10