Search code examples
rseriessequential

R : Lags in sequential serie


I need some help in R : I'm trying to indentify gaps in a sequential serie of two variables. Currently, I have a list looking like this.

data <- fetch(rs, n=-1)
names(data) <- c("~Open", "~Close")

Browse[2]> typeof(data)
[1] "list"

~Open    ~Close
10000     10019
10020     10039
10040     10051  -> Gap from 10052->10060 : I need 10040-10060
10060     10079
10100     10119  -> Gap from 10080->10099 : I need 10060-10099 or 10080-10099
10160     10179  -> Gap from 10120->10159 : I need 10120-10159 or 10100-10159

My result should look like a list with missing records (Start,Stop). For example:

Open        Close
10040       10060
10080       10099
10100       10159

or

Open        Close
10040       10099
10120       10159

Could someone please point me in the right direction ?

Thanks in advance.

Update: Trying to do :

gaps <- data %>% 
  mutate(lead_start = lead(Open) - 1) %>% 
  filter(Close != lead_start) %>% 
  transmute(Open = Close + 1, Close = lead_start)

I get the following error message:

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'Open' not found.

I actually just needed to do this :

data <- fetch(rs, n=-1)
lastOpen <- data[dim(data)[1], 2]
lastOpen <- lastOpen - lastOpen %% 20;
gaps <- as_tibble(data) %>% 
  mutate(lead_start = lead(Open) - 1) %>%
  filter(Close != lead_start) %>% 
  transmute(Open = (Close + 1) - ((Close + 1) %% 20), Close = lead_start) %>%
  add_row(Open = lastOpen, Close = Sys.time())

Thanks to mkeskisa !!!!!


Solution

  • I'm not sure if I fully understand what you are trying to achieve but I think this will help you to get the missing gaps. You say that you want the missing gaps but the list you provide would overlap with your earlier gaps. E.g. 10040 to 10060 overlaps with 10060 10079 to at 10060. Generally you probably can achieve what you are trying to do with using lag and/or lead.

    library(tidyverse)
    df <- tibble::tribble(
      ~Start,  ~Stop,
      10000L, 10019L,
      10020L, 10039L,
      10040L, 10051L,
      10060L, 10079L,
      10100L, 10119L,
      10160L, 10179L
      )
    
    gaps <- df %>% 
      mutate(lead_start = lead(Start) - 1) %>% 
      filter(Stop != lead_start) %>% 
      transmute(start = Stop + 1, stop = lead_start)
    
    gaps
    # A tibble: 3 x 2
      start  stop
      <dbl> <dbl>
    1 10052 10059
    2 10080 10099
    3 10120 10159