Search code examples
r

Forward search next occurance in parallel column


I must find the next occurance of one column in a parallel column. The search should only be performed forward/downward. An example is this:

library(tidyverse)

d <- tribble(~from, ~to, ~type, ~want,
         "a", "b", 9, 8,
         "b", "c", 8, 0,
         "a", "x", 6, 7,
         "c", "d", 0, 3,
         "c", "d", 5, 3,
         "x", "y", 7, NA,
         "d", "e", 3, 4,
         "e", "f", 4, 5, 
         "e", "f", 2, 5,
         "e", "f", 8, 5,
         "f", "g", 5, NA)

The rule is as follows: find the value of column type according to column to using the lookup column from. For example, to = "b" in row 1, the next occurance of value b in column from is 8, which gives the value for column want (want is the desired result and unknown initially), etc. The lead is not always 1, there can be rows between. Is there a simple way to do this?

Second example:

d2 <- tribble(~from, ~to, ~type,
          "a", "b", 0,
          "b", "a", 1,
          "a", "c", 2)

Solution

  • You could use non-equi join:

    rowid_to_column(d) %>%
      left_join(y = {.}, join_by(to == from,rowid < rowid), multiple = 'first') %>%
      select(from, to, type=type.x, want = type.y)
    
    # A tibble: 11 × 4
       from  to     type  want
       <chr> <chr> <dbl> <dbl>
     1 a     b         9     8
     2 b     c         8     0
     3 a     x         6     7
     4 c     d         0     3
     5 c     d         5     3
     6 x     y         7    NA
     7 d     e         3     4
     8 e     f         4     5
     9 e     f         2     5
    10 e     f         8     5
    11 f     g         5    NA
    
    
    rowid_to_column(d2) %>%
      left_join(y = {.}, join_by(to == from,rowid < rowid), multiple = 'first') %>%
      select(from, to, type=type.x, want = type.y)
    
    # A tibble: 3 × 4
      from  to     type  want
      <chr> <chr> <dbl> <dbl>
    1 a     b         0     1
    2 b     a         1     2
    3 a     c         2    NA
    

    If you want speed/efficiency, use data.table:

    library(data.table)
    setDT(d)[, n := .I][d, .(from, to = i.to, type = i.type, want = type), 
                      on = .(from == to, n > n), mult = 'first']
    
          from     to  type  want
        <char> <char> <num> <num>
     1:      a      b     9     8
     2:      b      c     8     0
     3:      a      x     6     7
     4:      c      d     0     3
     5:      c      d     5     3
     6:      x      y     7    NA
     7:      d      e     3     4
     8:      e      f     4     5
     9:      e      f     2     5
    10:      e      f     8     5
    11:      f      g     5    NA