Here is the example dataframe:
data.frame(sample = c('A','A','A','A','A','B','B','B','B','B'),
measure = c(20,30,40,60,60,20,60,50,40,10),
time = c(1,2,3,4,5,3,4,5,6,7),
start = c(1,1,1,1,1,3,3,3,3,3),
end = c(4,4,4,4,4,6,6,6,6,6))
sample measure time start end
1 A 20 1 1 4
2 A 30 2 1 4
3 A 40 3 1 4
4 A 60 4 1 4
5 A 60 5 1 4
6 B 20 3 3 6
7 B 60 4 3 6
8 B 50 5 3 6
9 B 40 6 3 6
10 B 10 7 3 6
What I want to do is create a new column called "calc" that calculates the measure for time=end / time=start
I'm guessing group_by sample first but I'm having trouble figuring out how to reference the start and end times. Here's what the final data should look like
sample measure time start end calc
1 A 20 1 1 4 3
2 A 30 2 1 4 3
3 A 40 3 1 4 3
4 A 60 4 1 4 3
5 A 60 5 1 4 3
6 B 20 3 3 6 2
7 B 60 4 3 6 2
8 B 50 5 3 6 2
9 B 40 6 3 6 2
10 B 10 7 3 6 2
what about (d
being your dataframe):
library(dplyr)
d |>
arrange(sample, time) |>
rowwise() |>
mutate(is_endpoint = any(time == start, time == end)) |>
ungroup() |>
mutate(calc = tail(measure[is_endpoint], 1) /
head(measure[is_endpoint], 1),
.by = sample
) |>
select(-is_endpoint)
output:
# A tibble: 10 x 6
sample measure time start end calc
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 20 1 1 4 3
2 A 30 2 1 4 3
3 A 40 3 1 4 3
4 A 60 4 1 4 3
5 A 60 5 1 4 3
6 B 20 3 3 6 2
7 B 60 4 3 6 2
8 B 50 5 3 6 2
9 B 40 6 3 6 2
10 B 10 7 3 6 2