Search code examples
rdplyrmutate

Create a custom function to avoid extreme rate


I'd like to create a custom function to avoid an extreme rate, in my case, lower until 2%, the opposite (increase the rate) is not a problem.

For this, I have a dataset:

# Dataset
area <- c("A","A","A","A","B","B","B","B","C","C","C","C")
dates <- c("05-01-2024","10-01-2024","18-01-2024","01-02-2024",
"05-01-2024","10-01-2024","18-01-2024","01-02-2024","05-01-2024",
"10-01-2024","18-01-2024", "01-02-2024")
response <- c(63,68,82,85,77,71,60,50,79,78,77,76)
my.ds <- data.frame(area = area, dates = dates, response = response)

I need a custom function that, if the last date is lower more than 2%, then the new value decreases only 2%, if the value increases or lower until 2%, there is nothing to do. I try to something like:

mutate(my.ds, response = ifelse(c(0, abs(diff(response))) < abs(response * .02), lag(response),  response),  .by = area)

but doesn't work.

My desirable new.my.ds, must be:

# my.ds.new
#  area      dates response
#1    A 2024-01-05       63
#2    A 2024-01-10       68
#3    A 2024-01-18       82
#4    A 2024-02-01       85
#5    B 2024-01-05       77
#6    B 2024-01-10       76
#7    B 2024-01-18       75
#8    B 2024-02-01       74
#9    C 2024-01-05       79
#10   C 2024-01-10       78
#11   C 2024-01-18       77
#12   C 2024-02-01       76

Please, any help with it?


Solution

  • try this:

    Please note, that ceiling() rounds up the response value. If you don't want that, then you should change it to if (change < -threshold) capped_response[i] <- round(capped_response[i - 1]- capped_response[i - 1]* threshold,0)

    library(dplyr)
    area <- c("A","A","A","A","B","B","B","B","C","C","C","C")
    dates <- c("05-01-2024","10-01-2024","18-01-2024","01-02-2024",
               "05-01-2024","10-01-2024","18-01-2024","01-02-2024","05-01-2024",
               "10-01-2024","18-01-2024", "01-02-2024")
    response <- c(63,68,82,85,77,71,60,50,79,78,77,76)
    my.ds <- data.frame(area = area, dates = dates, response = response)
    
    # Custom function to cap changes in response
    cap_response <- function(response, threshold = 0.02) {
      capped_response <- response
      for (i in 2:length(response)) {
        change <- (capped_response[i] - capped_response[i - 1])/capped_response[i - 1]
        if (change < -threshold) capped_response[i] <- ceiling(capped_response[i - 1]- capped_response[i - 1]* threshold)
        # if (change < -threshold) capped_response[i] <- round(capped_response[i - 1]- capped_response[i - 1]* threshold,0)
      }
      return(capped_response)
    }
    
    # Apply the function using mutate and group_by
    new.my.ds <- my.ds %>%
      group_by(area) %>%
      mutate(response = cap_response(response)) %>%
      ungroup()
    

    which finally prints your required result:

    # A tibble: 12 × 3
       area  dates      response
       <chr> <chr>         <dbl>
     1 A     05-01-2024       63
     2 A     10-01-2024       68
     3 A     18-01-2024       82
     4 A     01-02-2024       85
     5 B     05-01-2024       77
     6 B     10-01-2024       76
     7 B     18-01-2024       75
     8 B     01-02-2024       74
     9 C     05-01-2024       79
    10 C     10-01-2024       78
    11 C     18-01-2024       77
    12 C     01-02-2024       76