Search code examples
rdplyrmutate

good idea to use function that uses a for-loop in dplyr's mutate


So I have a function whos idea it is to operate on a vector of numbers. E.g. a vector of temperatures. I want to compute heatwaves (in a very simplified way...). Lets say a heatwave starts with three consecutive days of above 30 °C.

So I would need a back-reference to store how long the current heatwave already is. I wrote a function that uses a for-loop internally. In pseudo-code it kind of looks like this:

is_heatwave = function(vals){
  
  length_heatwave = 0
  
  # returns a vector with the length of the input vals
  day_in_heatwave = vector(length=length(vals))
  days_in_current_heatwave =c()
  
  for(i in 1:length(vals)){
    val = vals[[i]]
    
    if(val > 30){
      length_heatwave = length_heatwave + 1
      days_in_current_heatwave = c(days_in_current_heatwave, i)
    }else{
      length_heatwave = 0
    }
    
    ... some more code
  }
  
  return(day_in_heatwave)
    
}

This code might be wrong. But the idea is that the function takes as input a vector with the length as the data.frame has rows. And returns a vector of the same length.

my idea is to have a function that I can use like this:

df = data.frame(
  temps = c(30,30,32,30,24)
)

df %>% mutate(is_heatwave = is_heatwave(temps))

I just wanted to ask if this generally is a good idea or are there any better ideas?


Solution

  • Already good answers, so let's add some nuances.

    This solution gives an unique streak_id that may or may not be a heat_wave. hot_days_acc is the number of hot days accumulated on a streak.

    The code:

    # library(tidyverse)
    
    # -------------------     
    # Number of days in a heat wave
    heat_wave_days <- 3
    
    # Temperature threshold 
    hot_day <- 30
    
    # Some toy data
    set.seed(100)
    aux_df <- tibble(temp = sample(-2:2 + hot_day, 50, replace = TRUE))
    
    #
    aux_df <- aux_df %>% 
      mutate(
        hot_days_acc = if_else(temp >= hot_day, TRUE, FALSE),
        streak_id = consecutive_id(hot_days_acc)) %>% 
      
      add_count(streak_id, name = "heat_wave") %>% 
    
      mutate(
        .by = streak_id, 
        heat_wave = if_else(
          all(hot_days_acc == TRUE) & heat_wave >= heat_wave_days, 
          TRUE, FALSE)) %>% 
      
      mutate(streak_id = consecutive_id(heat_wave)) %>% 
      mutate(.by = streak_id, hot_days_acc = cumsum(hot_days_acc)) %>% 
      
      relocate(temp, streak_id, heat_wave, hot_days_acc)
    

    The output:

    > print(aux_df, n = nrow(aux_df))
    # A tibble: 50 × 4
        temp streak_id heat_wave hot_days_acc
       <dbl>     <int> <lgl>            <int>
     1    29         1 FALSE                0
     2    30         1 FALSE                1
     3    28         1 FALSE                1
     4    29         1 FALSE                1
     5    31         1 FALSE                2
     6    31         1 FALSE                3
     7    29         1 FALSE                3
     8    30         1 FALSE                4
     9    29         1 FALSE                4
    10    32         2 TRUE                 1
    11    31         2 TRUE                 2
    12    30         2 TRUE                 3
    13    30         2 TRUE                 4
    14    29         3 FALSE                0
    15    28         3 FALSE                0
    16    29         3 FALSE                0
    17    30         4 TRUE                 1
    18    31         4 TRUE                 2
    19    31         4 TRUE                 3
    20    31         4 TRUE                 4
    21    32         4 TRUE                 5
    22    30         4 TRUE                 6
    23    28         5 FALSE                0
    24    30         5 FALSE                1
    25    31         5 FALSE                2
    26    29         5 FALSE                2
    27    32         6 TRUE                 1
    28    32         6 TRUE                 2
    29    32         6 TRUE                 3
    30    28         7 FALSE                0
    31    32         8 TRUE                 1
    32    31         8 TRUE                 2
    33    30         8 TRUE                 3
    34    28         9 FALSE                0
    35    28         9 FALSE                0
    36    28         9 FALSE                0
    37    30         9 FALSE                1
    38    28         9 FALSE                1
    39    28         9 FALSE                1
    40    31        10 TRUE                 1
    41    30        10 TRUE                 2
    42    32        10 TRUE                 3
    43    30        10 TRUE                 4
    44    31        10 TRUE                 5
    45    30        10 TRUE                 6
    46    30        10 TRUE                 7
    47    30        10 TRUE                 8
    48    31        10 TRUE                 9
    49    30        10 TRUE                10
    50    32        10 TRUE                11