Search code examples
rpostgresqltidyversesliding-windowdbplyr

R slide window through tibble


I got a simple question that I cannot figure out solutions. Also, I didn't find an answer that I understand.

Imagine I got this data frame

(ts <- tibble(
+   a = LETTERS[1:10],
+   b = c(rep(1, 5), rep(2,5))
+ ))
# A tibble: 10 x 2
   a         b
   <chr> <dbl>
 1 A         1
 2 B         1
 3 C         1
 4 D         1
 5 E         1
 6 F         2
 7 G         2
 8 H         2
 9 I         2
10 J         2

What I want is simple. I want to build a df with the column b indexing a sliding window which sizes n f the column a.

The output can be something like this:

# A tibble: 8 x 2
      b a    
  <dbl> <chr>
1     1 A B  
2     1 B C  
3     1 C D  
4     1 D E  
5     2 F G  
6     2 G H  
7     2 H I  
8     2 I J  

I don't care if the column a contains an array (nest values). I just need a new data frame based on the sliding window.

Since this operation will run in a relational database I'd like a function compatible with DBI-PostgresSQL.

Any help is appreciated.

Thanks in advance


Solution

  • We can group by 'b', create the new column based on the lead of 'a', remove the NA rows with na.omit

    library(dplyr)
    ts %>% 
        group_by(b) %>% 
        mutate(a2 = lead(a)) %>% 
        ungroup %>%
        na.omit %>%
        select(b, everything())
    # A tibble: 8 x 3
    #      b a     a2   
    #  <dbl> <chr> <chr>
    #1     1 A     B    
    #2     1 B     C    
    #3     1 C     D    
    #4     1 D     E    
    #5     2 F     G    
    #6     2 G     H    
    #7     2 H     I    
    #8     2 I     J    
    

    If lead doesn't works, then just remove the first element, append NA at the end in the mutate step

    ts %>%  
         group_by(b) %>% 
         mutate(a2 = c(a[-1], NA)) %>%
         ungroup %>% 
         na.omit %>% 
         select(b, everything())