Search code examples
rrle

Error in rle function to frequency and duration for at least two consecutive months


I have data recording drought conditions over several months. I want to calculate the total duration of droughts and the average duration of droughts, with the condition that a drought starts when the SPI value falls below <=-1 for at least two consecutive months and ends when the SPI returns to a positive value.

Here's an example of SPI data:

data <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0,0,0)
data[1:2] == conditions are met, drought start
data[1:4]
data[5] == values >= 0, drought stop
Drought Event == 1, with a 4 months of duration
data[5:8] == conditions are not met, at least two consecutive months

data[9:10]== conditions are met, drought start
data[9:11] 
data[12] == values >= 0, drought stop
Drought Event == 1, with a 3 months of duration
Total Drought Event == 2

I have attempted to create a function as follows:

d.frequency <- function(x, na.rm= TRUE) {
  rle_result <- rle(x <= -1)
  drought_runs <- rle_result$values
  drought_lengths <- rle_result$lengths
  # Mencari indeks di mana kekeringan dimulai
  drought_start_indices <- which(drought_runs & c(0, head(drought_runs, -1)) == FALSE)
  # Mencari indeks di mana kekeringan berakhir
  drought_end_indices <- which(drought_runs & c(tail(drought_runs, -1), 0) == FALSE)
  # Menghitung durasi kekeringan minimal 2 bulan
  valid_droughts <- which(drought_lengths >= 2)
  
  # Memfilter kejadian kekeringan yang memenuhi durasi minimal
  valid_drought_start <- drought_start_indices[drought_start_indices %in% valid_droughts]
  valid_drought_end <- drought_end_indices[drought_end_indices %in% valid_droughts]
  
  # Menghitung jumlah dan durasi kekeringan
  num_droughts <- length(valid_drought_start)
  drought_durations <- drought_end_indices[valid_drought_end] - drought_start_indices[valid_drought_start] + 1
  #return(num_droughts)
  # Menyimpan durasi kekeringan untuk keperluan analisis lebih lanjut
  durations_list <- list(num_droughts = num_droughts, durations = sum(drought_durations, na.rm = TRUE))
  
  return(durations_list)
}

And here is the result :

d.frequency(data)
$num_droughts
[1] 2

$durations
[1] 1

However, the result is not as expected. I want to get a total drought duration of 7 months and an average drought duration of total drought duration 7/2 months.

How can I modify this function to meet these requirements?

Thank you very much!


Solution

  • We assume the data collection starts during a non-drought period. Otherwise change drought_on to TRUE.

    foo <- function(x, drought_on = FALSE) {
      stopifnot(is.numeric(x))
      n <- length(x)
      drought <- vector(length = n)
      for (t in seq(1, n-1)) {
        if (drought_on) drought_on <- x[t] < 0
        else            drought_on <- all(x[t:(t+1)] <= -1)
        drought[t] <- drought_on
      }
      drought[n] <- drought_on && x[n] < 0
      total <- sum(drought)
      list(
        "Total duration" = total, 
        "Average length" = total / sum(rle(drought)$values) 
      )
    }
    
    foo(data)
    
    # $`Total duration`
    # [1] 7
    # 
    # $`Average length`
    # [1] 3.5