I'm working in RStudio with my dataset that contains the daily milk production of cows reared in different locations. Also, I have 5 columns with the temperature in the corresponding 5 previous days to the milking control day.
I would like to count the number of consecutive days ion which the temperature overcome a certain threshold (ex. 30ºC) independently of the position of these days (3 consecutive days could occur during the 3,4,5 days previous to milking control for example). Also, in case that during this 5 days an event with for example, 1 days, and another event with 3 consecutive days occur, I need to take into account the higher number.
Here is a toy dataset that schematically reflects my dataset. How can I count in R the consecutive days that satisfy my conditions?
data <- data.frame(cow=1:5, milk=c(35,36,36,35,34),
day1ago=c(27,28,20,24,33),
day2ago=c(25,25,32,31,28),
day3ago=c(22,31,25,31,29),
day4ago=c(28,33,32,33,28),
day5ago=c(29,28,33,34,31))
To these toy dataset, I would expect to obtain a vector like this:
data$consecutive_days = c(0,2,2,4,1)
You could use rle
on a row-wise data frame:
library(dplyr)
data |>
rowwise() |>
mutate(rle = list(rle(c_across(starts_with("day")) > 30)),
consecutive_days = with(rle, ifelse(any(values), max(lengths[values]), 0))) |>
ungroup() |>
select(-rle)
rle
computes runs of equal values. In this case runs of TRUE
/FALSE
if the value is above or below the threshold. We use this to find the longest (max
) length (lengths
) of TRUE
values.
Output
cow milk day1ago day2ago day3ago day4ago day5ago consecutive_days
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 35 27 25 22 28 29 0
2 2 36 28 25 31 33 28 2
3 3 36 20 32 25 32 33 2
4 4 35 24 31 31 33 34 4
5 5 34 33 28 29 28 31 1