Search code examples
rmax

How to count maximum value for given time period in R?


I got the data from MySQL and I'm trying to visualize it and uncover some answers. Using R for the statistic.

The final product is % discount for each price change (=row).

Here is an example of my dataset.

     itemId pricehis           timestamp
1  69295477     1290 2022-04-12 04:42:53
2  69295624     1145 2022-04-12 04:42:53
3  69296136     3609 2022-04-12 04:42:54
4  69296607      855 2022-04-12 04:42:53
5  69295291     1000 2022-04-12 04:42:50
6  69295475     4188 2022-04-12 04:42:52
7  69295614     1145 2022-04-12 04:42:51
8  69295622     1290 2022-04-12 04:42:50
9  69295692     3609 2022-04-12 04:42:49
10 69295917     1725 2022-04-12 04:42:48
11 69296090     2449 2022-04-12 04:42:53
12 69296653     1145 2022-04-12 04:42:51
13 69296657     5638 2022-04-12 04:42:48
14 69296661     1725 2022-04-12 04:42:51
15 69296696      710 2022-04-12 04:42:51

I've been stuck at one part of the calculation - maximum value for each productId in 6 months.

In the dataset, there are rows for specific productId with different pricehis values and different timestamps. I need to find the max value for a given row no older than 6 months.

The formula for calculating the desired discount is:

Discount grouped by itemId = 1 - pricehis / max(pricehis in the last 6 months)

At this moment I'm unable to solve the second part - pricehis in the last 6 months. - I need a new column with maximum 'pricehis' in the last 6 months for the itemId. Also could be known as interval maximum.

I can group it by the itemId, but I can't figure out how to add the condition on 6 months max.

Any tips on how to get this?


Solution

  • I like slider::slide_index_dbl for this sort of thing. Here's some fake data chosen to demonstrate the 6mo window:

    data.frame(itemId = rep(1:2, each = 6),
               price = floor(100*cos(0:11)^2),
               timestamp = as.Date("2000-01-01") + 50*(0:11)) -> df
    

    We can start with df, group it by itemId, and then calula and then apply the window function. (Note that slider requires the data to be sorted by date within each group.)

    library(dplyr).  
    library(lubridate) # for `%m-%`, to get sliding months (harder than it sounds!)
    df %>%     
      group_by(itemId) %>%
      mutate(max_6mo = slider::slide_index_dbl(.x = price,     # based on price...
                                               .i = timestamp, # and timestamp...
                                               .f = max,       # what's the max...
                                               .before = ~.x %m-% months(6))) %>% # over the last 6mo
      mutate(discount = 1 - price / max_6mo) %>%               # use that to calc discount
      ungroup()
    

    Result

    # A tibble: 12 × 5
       itemId price timestamp  max_6mo discount
        <int> <dbl> <date>       <dbl>    <dbl>
     1      1   100 2000-01-01     100   0     
     2      1    29 2000-02-20     100   0.71  
     3      1    17 2000-04-10     100   0.83  
     4      1    98 2000-05-30     100   0.0200
     5      1    42 2000-07-19      98   0.571    # new max since >6mo since 100
     6      1     8 2000-09-07      98   0.918 
     7      2    92 2000-10-27      92   0     
     8      2    56 2000-12-16      92   0.391 
     9      2     2 2001-02-04      92   0.978 
    10      2    83 2001-03-26      92   0.0978
    11      2    70 2001-05-15      83   0.157    # new max since >6mo since 92
    12      2     0 2001-07-04      83   1