Search code examples
rdata.tablequantmod

Filter unchanging price portions preceding and/or following {quantmod} series


I have many stock price series from {BatchGetSymbols}, a {quantmod} wrapper I think, and would like to filter out when there is a section of unchanging price for an extended period at the start and/or end of the series. Not all of the tickers will have these unchanging sections. In the example below, I would like to remove the unchanging price of $10 at the beginning and $5 at the end, keeping the only the last $10, first $5 and the varying prices in between.

The price of the unchanging parts cannot be known in advance and will change from case to case. The first date in the series will have the price of the first unchanging price, and the last date will have the second unchanging price. I have almost 5 million rows would like a data.table solution.

``` r
# Data
library(data.table)
data <- 
  data.table::data.table(
    ticker = "stockA",
    date = seq.Date(
      from = as.Date("2017-6-30"),
      to = as.Date("2017-7-19"),
      by = 1
      ),
    price =  c(rep(10, 5), rnorm(10, 8, 1), rep(5, 5))
  )

# Plot showing unchanging portion at start and end
plot(data$price)
```

enter image description here

DESIRED RESULT:

``` r
new_data <- 
  rbind(
    data[!price %in% c(10, 5)],
    data.table(
      ticker = "stockA",
      date = c(as.Date("2017-06-30"), as.Date("2017-08-29")),
      price = c(10, 5)
    ))[order(date)]

new_data
#>     ticker       date     price
#>  1: stockA 2017-06-30 10.000000
#>  2: stockA 2017-07-05  6.890370
#>  3: stockA 2017-07-06  8.137852
#>  4: stockA 2017-07-07  7.759324
#>  5: stockA 2017-07-08  8.861941
#>  6: stockA 2017-07-09  8.250837
#>  7: stockA 2017-07-10  8.570328
#>  8: stockA 2017-07-11  8.826646
#>  9: stockA 2017-07-12  7.872192
#> 10: stockA 2017-07-13  7.755318
#> 11: stockA 2017-07-14  9.731524
#> 12: stockA 2017-08-29  5.000000
```

Created on 2021-07-22 by the reprex package (v2.0.0)


Solution

  • You can use rleid to create an id of consecutive similar values and then remove rows of first and last price.

    library(data.table)
    
    data[, id := rleid(price)]
    data[!(price == first(price) & id == 1 | price == last(price) & id == max(id))]
    
    #    ticker       date     price id
    # 1: stockA 2017-07-05 9.1267303  2
    # 2: stockA 2017-07-06 7.8969750  3
    # 3: stockA 2017-07-07 6.4109158  4
    # 4: stockA 2017-07-08 7.1900800  5
    # 5: stockA 2017-07-09 9.6342601  6
    # 6: stockA 2017-07-10 9.4615477  7
    # 7: stockA 2017-07-11 9.4091043  8
    # 8: stockA 2017-07-12 8.5279983  9
    # 9: stockA 2017-07-13 7.7585034 10
    #10: stockA 2017-07-14 8.1477831 11
    

    To include the last and first price use -

    data[!duplicated(id, fromLast = TRUE) & id == 1 | 
         !duplicated(id) & id == max(id) | between(id, 2, max(id) - 1)]
    
    #    ticker       date      price id
    # 1: stockA 2017-07-04 10.0000000  1
    # 2: stockA 2017-07-05  9.1267303  2
    # 3: stockA 2017-07-06  7.8969750  3
    # 4: stockA 2017-07-07  6.4109158  4
    # 5: stockA 2017-07-08  7.1900800  5
    # 6: stockA 2017-07-09  9.6342601  6
    # 7: stockA 2017-07-10  9.4615477  7
    # 8: stockA 2017-07-11  9.4091043  8
    # 9: stockA 2017-07-12  8.5279983  9
    #10: stockA 2017-07-13  7.7585034 10
    #11: stockA 2017-07-14  8.1477831 11
    #12: stockA 2017-07-15  5.0000000 12