I have many stock price series from {BatchGetSymbols}, a {quantmod} wrapper I think, and would like to filter out when there is a section of unchanging price for an extended period at the start and/or end of the series. Not all of the tickers will have these unchanging sections. In the example below, I would like to remove the unchanging price of $10 at the beginning and $5 at the end, keeping the only the last $10, first $5 and the varying prices in between.
The price of the unchanging parts cannot be known in advance and will change from case to case. The first date in the series will have the price of the first unchanging price, and the last date will have the second unchanging price. I have almost 5 million rows would like a data.table solution.
``` r
# Data
library(data.table)
data <-
data.table::data.table(
ticker = "stockA",
date = seq.Date(
from = as.Date("2017-6-30"),
to = as.Date("2017-7-19"),
by = 1
),
price = c(rep(10, 5), rnorm(10, 8, 1), rep(5, 5))
)
# Plot showing unchanging portion at start and end
plot(data$price)
```
DESIRED RESULT:
``` r
new_data <-
rbind(
data[!price %in% c(10, 5)],
data.table(
ticker = "stockA",
date = c(as.Date("2017-06-30"), as.Date("2017-08-29")),
price = c(10, 5)
))[order(date)]
new_data
#> ticker date price
#> 1: stockA 2017-06-30 10.000000
#> 2: stockA 2017-07-05 6.890370
#> 3: stockA 2017-07-06 8.137852
#> 4: stockA 2017-07-07 7.759324
#> 5: stockA 2017-07-08 8.861941
#> 6: stockA 2017-07-09 8.250837
#> 7: stockA 2017-07-10 8.570328
#> 8: stockA 2017-07-11 8.826646
#> 9: stockA 2017-07-12 7.872192
#> 10: stockA 2017-07-13 7.755318
#> 11: stockA 2017-07-14 9.731524
#> 12: stockA 2017-08-29 5.000000
```
Created on 2021-07-22 by the reprex package (v2.0.0)
You can use rleid
to create an id of consecutive similar values and then remove rows of first and last price.
library(data.table)
data[, id := rleid(price)]
data[!(price == first(price) & id == 1 | price == last(price) & id == max(id))]
# ticker date price id
# 1: stockA 2017-07-05 9.1267303 2
# 2: stockA 2017-07-06 7.8969750 3
# 3: stockA 2017-07-07 6.4109158 4
# 4: stockA 2017-07-08 7.1900800 5
# 5: stockA 2017-07-09 9.6342601 6
# 6: stockA 2017-07-10 9.4615477 7
# 7: stockA 2017-07-11 9.4091043 8
# 8: stockA 2017-07-12 8.5279983 9
# 9: stockA 2017-07-13 7.7585034 10
#10: stockA 2017-07-14 8.1477831 11
To include the last and first price use -
data[!duplicated(id, fromLast = TRUE) & id == 1 |
!duplicated(id) & id == max(id) | between(id, 2, max(id) - 1)]
# ticker date price id
# 1: stockA 2017-07-04 10.0000000 1
# 2: stockA 2017-07-05 9.1267303 2
# 3: stockA 2017-07-06 7.8969750 3
# 4: stockA 2017-07-07 6.4109158 4
# 5: stockA 2017-07-08 7.1900800 5
# 6: stockA 2017-07-09 9.6342601 6
# 7: stockA 2017-07-10 9.4615477 7
# 8: stockA 2017-07-11 9.4091043 8
# 9: stockA 2017-07-12 8.5279983 9
#10: stockA 2017-07-13 7.7585034 10
#11: stockA 2017-07-14 8.1477831 11
#12: stockA 2017-07-15 5.0000000 12