I need some help in R : I'm trying to indentify gaps in a sequential serie of two variables. Currently, I have a list looking like this.
data <- fetch(rs, n=-1)
names(data) <- c("~Open", "~Close")
Browse[2]> typeof(data)
[1] "list"
~Open ~Close
10000 10019
10020 10039
10040 10051 -> Gap from 10052->10060 : I need 10040-10060
10060 10079
10100 10119 -> Gap from 10080->10099 : I need 10060-10099 or 10080-10099
10160 10179 -> Gap from 10120->10159 : I need 10120-10159 or 10100-10159
My result should look like a list with missing records (Start,Stop). For example:
Open Close
10040 10060
10080 10099
10100 10159
or
Open Close
10040 10099
10120 10159
Could someone please point me in the right direction ?
Thanks in advance.
Update: Trying to do :
gaps <- data %>%
mutate(lead_start = lead(Open) - 1) %>%
filter(Close != lead_start) %>%
transmute(Open = Close + 1, Close = lead_start)
I get the following error message:
Error in mutate_impl(.data, dots) :
Evaluation error: object 'Open' not found.
I actually just needed to do this :
data <- fetch(rs, n=-1)
lastOpen <- data[dim(data)[1], 2]
lastOpen <- lastOpen - lastOpen %% 20;
gaps <- as_tibble(data) %>%
mutate(lead_start = lead(Open) - 1) %>%
filter(Close != lead_start) %>%
transmute(Open = (Close + 1) - ((Close + 1) %% 20), Close = lead_start) %>%
add_row(Open = lastOpen, Close = Sys.time())
Thanks to mkeskisa !!!!!
I'm not sure if I fully understand what you are trying to achieve but I think this will help you to get the missing gaps. You say that you want the missing gaps but the list you provide would overlap with your earlier gaps. E.g. 10040 to 10060 overlaps with 10060 10079 to at 10060. Generally you probably can achieve what you are trying to do with using lag and/or lead.
library(tidyverse)
df <- tibble::tribble(
~Start, ~Stop,
10000L, 10019L,
10020L, 10039L,
10040L, 10051L,
10060L, 10079L,
10100L, 10119L,
10160L, 10179L
)
gaps <- df %>%
mutate(lead_start = lead(Start) - 1) %>%
filter(Stop != lead_start) %>%
transmute(start = Stop + 1, stop = lead_start)
gaps
# A tibble: 3 x 2
start stop
<dbl> <dbl>
1 10052 10059
2 10080 10099
3 10120 10159