Search code examples
rdataframedplyrgroup-byrle

How to use rle function in a dataframe


I have a dataframe (df) like this.

df <- data.frame(prox)
  year month   day Tmean

`<dbl> <dbl> <dbl> <dbl>`
1  1956     1     1 13.5
2  1956     1     2 11.9
3  1956     1     3  9.71
4  1956     1     4  8.65
5  1956     1     5  4.51
6  1956     1     6  4.64
7  1956     1     7  6.66
8  1956     1     8  7.48
9  1956     1     9  5.56
10  1956     1    10  7.51

I want to find the maximum number of consecutive days with a decrease in temperature. So, I did this (with the help of @Andre Wildberg) for a single year and it works.

y <- rle(diff(df$Tmean) < 0) 
max(y$lengths[y$values], na.rm=TRUE) 
# [1] 6

But now, I want to find this, for each year (1956,1957,...). So, i' m trying to do this with group_by, but there is problem as rle is a list. Is it possible to do this somehow, or I have to find another way to do it?

df %>% group_by(year) %>%   
    summarise(x=list(rle(diff(df$Tmean) < 0)))  
   year    x  
   <dbl> <list> 
1  1956 <rle>  
2  1957 <rle>
3  1958 <rle> 

Solution

  • We can use with to subset the lengths based on the values and get the max of the lengths after grouping by 'year'

    library(dplyr)
    df %>%
       group_by(year) %>%
       summarise(x = with(rle(diff(Tmean) < 0),
            max(lengths[values], na.rm = TRUE))
    )
    

    Or using base R

    f1 <- function(x) {
       y <- rle(diff(x) < 0)
       max(y$lengths[y$values], na.rm = TRUE)}
    aggregate(Tmean ~ year, df, f1)