Search code examples
rdplyrzoo

Find maximum value irrespective of whether is a positive/negative number in the previous nth rows?


Hi I have a dataframe as such.

    #Data
df2 <- structure(list(Date = structure(c(18502, 18503, 18504, 18505, 
                                        18506, 18507, 18508, 18509, 18510, 18511, 18512, 18513, 18514, 
                                        18515, 18516, 18517, 18518, 18519, 18520, 18521, 18522, 18523, 
                                        18524, 18525, 18526, 18527, 18528, 18529, 18530, 18531, 18532, 
                                        18533, 18534, 18535, 18536, 18537, 18538, 18539, 18540), class = "Date"), 
                     Price = c(1490, 3604, 2003, -4004, 4247, -189008, 2506, 4044, 
                               2604, 2204, -4316, 2190, 3137, 2694, 711, 4075, 1315, 454, 
                               1660, 4306, 4032, 3201, 2980, 4474, 3044, 3267, 2573, 2784, 
                               1497, 897, 4342, 4086, 3192, 3634, 380, 2293, 3478, 1190, 
                               1619)), class = "data.frame", row.names = c(NA, -39L))

what I want is to find the absolute-value max and not just the max. For example, if the previous nth row is set to 4 then row 7 would be -189008. I have tried two functions but both do not have the option to return the absolute value.

df2 %>%
  mutate(abshigh = slide_dbl(Price, max, .before = 4, .complete = TRUE))

zoo::rollmaxr (df2$Price,k=4,fill=NA)

above function outputs

        Date   Price abshigh
1  2020-08-28    1490      NA
2  2020-08-29    3604      NA
3  2020-08-30    2003      NA
4  2020-08-31   -4004      NA
5  2020-09-01    4247    4247
6  2020-09-02 -189008    4247
7  2020-09-03    2506    4247
8  2020-09-04    4044    4247
9  2020-09-05    2604    4247
10 2020-09-06    2204    4044
11 2020-09-07   -4316    4044
12 2020-09-08    2190    4044
13 2020-09-09    3137    3137
14 2020-09-10    2694    3137
15 2020-09-11     711    3137
16 2020-09-12    4075    4075
17 2020-09-13    1315    4075
18 2020-09-14     454    4075
19 2020-09-15    1660    4075
20 2020-09-16    4306    4306
21 2020-09-17    4032    4306
22 2020-09-18    3201    4306
23 2020-09-19    2980    4306
24 2020-09-20    4474    4474
25 2020-09-21    3044    4474
26 2020-09-22    3267    4474
27 2020-09-23    2573    4474
28 2020-09-24    2784    4474
29 2020-09-25    1497    3267
30 2020-09-26     897    3267
31 2020-09-27    4342    4342
32 2020-09-28    4086    4342
33 2020-09-29    3192    4342
34 2020-09-30    3634    4342
35 2020-10-01     380    4342
36 2020-10-02    2293    4086
37 2020-10-03    3478    3634
38 2020-10-04    1190    3634
39 2020-10-05    1619    3478

this is incorrect since row seven should return -189008 instead of 4247 thanks.


Solution

  • One option is to get the slide max on the absolute 'Price' and then change the sign by matching the absolute values of 'Price' (potential bug alert!)

    library(dplyr)
    library(slider)
    df2 %>% 
        mutate(abshigh = slide_dbl(abs(Price), max, .before = 4, .complete = TRUE), 
        abshigh = abshigh * sign(Price[match(abshigh, abs(Price))]))
    

    -output

              Date   Price abshigh
    1  2020-08-28    1490      NA
    2  2020-08-29    3604      NA
    3  2020-08-30    2003      NA
    4  2020-08-31   -4004      NA
    5  2020-09-01    4247    4247
    6  2020-09-02 -189008 -189008
    7  2020-09-03    2506 -189008
    8  2020-09-04    4044 -189008
    9  2020-09-05    2604 -189008
    10 2020-09-06    2204 -189008
    11 2020-09-07   -4316   -4316
    12 2020-09-08    2190   -4316
    13 2020-09-09    3137   -4316
    14 2020-09-10    2694   -4316
    15 2020-09-11     711   -4316
    16 2020-09-12    4075    4075
    17 2020-09-13    1315    4075
    18 2020-09-14     454    4075
    19 2020-09-15    1660    4075
    20 2020-09-16    4306    4306
    21 2020-09-17    4032    4306
    22 2020-09-18    3201    4306
    23 2020-09-19    2980    4306
    24 2020-09-20    4474    4474
    25 2020-09-21    3044    4474
    26 2020-09-22    3267    4474
    27 2020-09-23    2573    4474
    28 2020-09-24    2784    4474
    29 2020-09-25    1497    3267
    30 2020-09-26     897    3267
    31 2020-09-27    4342    4342
    32 2020-09-28    4086    4342
    33 2020-09-29    3192    4342
    34 2020-09-30    3634    4342
    35 2020-10-01     380    4342
    36 2020-10-02    2293    4086
    37 2020-10-03    3478    3634
    38 2020-10-04    1190    3634
    39 2020-10-05    1619    3478
    

    Or another option is to use row_number() i.e. sequence as input to slide_dbl and extract the values of 'Price' based on the sequence, get the max element index of absolute values with which.max to subset the 'Price' subset (This would be more correct compared to the previous solution as match can have undesirable consequence on the absolute values if there are both negative and positive elements with same value

    df2 %>% 
      mutate(abshigh = slide_dbl(row_number(), 
       .f = ~ Price[.x][which.max(abs(Price[.x]))], .before = 4, .complete = TRUE))
    

    -output

         Date   Price abshigh
    1  2020-08-28    1490      NA
    2  2020-08-29    3604      NA
    3  2020-08-30    2003      NA
    4  2020-08-31   -4004      NA
    5  2020-09-01    4247    4247
    6  2020-09-02 -189008 -189008
    7  2020-09-03    2506 -189008
    8  2020-09-04    4044 -189008
    9  2020-09-05    2604 -189008
    10 2020-09-06    2204 -189008
    11 2020-09-07   -4316   -4316
    12 2020-09-08    2190   -4316
    13 2020-09-09    3137   -4316
    14 2020-09-10    2694   -4316
    15 2020-09-11     711   -4316
    16 2020-09-12    4075    4075
    17 2020-09-13    1315    4075
    18 2020-09-14     454    4075
    19 2020-09-15    1660    4075
    20 2020-09-16    4306    4306
    21 2020-09-17    4032    4306
    22 2020-09-18    3201    4306
    23 2020-09-19    2980    4306
    24 2020-09-20    4474    4474
    25 2020-09-21    3044    4474
    26 2020-09-22    3267    4474
    27 2020-09-23    2573    4474
    28 2020-09-24    2784    4474
    29 2020-09-25    1497    3267
    30 2020-09-26     897    3267
    31 2020-09-27    4342    4342
    32 2020-09-28    4086    4342
    33 2020-09-29    3192    4342
    34 2020-09-30    3634    4342
    35 2020-10-01     380    4342
    36 2020-10-02    2293    4086
    37 2020-10-03    3478    3634
    38 2020-10-04    1190    3634
    39 2020-10-05    1619    3478