I must find the next occurance of one column in a parallel column. The search should only be performed forward/downward. An example is this:
library(tidyverse)
d <- tribble(~from, ~to, ~type, ~want,
"a", "b", 9, 8,
"b", "c", 8, 0,
"a", "x", 6, 7,
"c", "d", 0, 3,
"c", "d", 5, 3,
"x", "y", 7, NA,
"d", "e", 3, 4,
"e", "f", 4, 5,
"e", "f", 2, 5,
"e", "f", 8, 5,
"f", "g", 5, NA)
The rule is as follows: find the value of column type
according to column to
using the lookup column from
. For example, to = "b"
in row 1, the next occurance of value b in column from
is 8
, which gives the value for column want
(want
is the desired result and unknown initially), etc. The lead is not always 1, there can be rows between. Is there a simple way to do this?
Second example:
d2 <- tribble(~from, ~to, ~type,
"a", "b", 0,
"b", "a", 1,
"a", "c", 2)
You could use non-equi
join:
rowid_to_column(d) %>%
left_join(y = {.}, join_by(to == from,rowid < rowid), multiple = 'first') %>%
select(from, to, type=type.x, want = type.y)
# A tibble: 11 × 4
from to type want
<chr> <chr> <dbl> <dbl>
1 a b 9 8
2 b c 8 0
3 a x 6 7
4 c d 0 3
5 c d 5 3
6 x y 7 NA
7 d e 3 4
8 e f 4 5
9 e f 2 5
10 e f 8 5
11 f g 5 NA
rowid_to_column(d2) %>%
left_join(y = {.}, join_by(to == from,rowid < rowid), multiple = 'first') %>%
select(from, to, type=type.x, want = type.y)
# A tibble: 3 × 4
from to type want
<chr> <chr> <dbl> <dbl>
1 a b 0 1
2 b a 1 2
3 a c 2 NA
If you want speed/efficiency, use data.table
:
library(data.table)
setDT(d)[, n := .I][d, .(from, to = i.to, type = i.type, want = type),
on = .(from == to, n > n), mult = 'first']
from to type want
<char> <char> <num> <num>
1: a b 9 8
2: b c 8 0
3: a x 6 7
4: c d 0 3
5: c d 5 3
6: x y 7 NA
7: d e 3 4
8: e f 4 5
9: e f 2 5
10: e f 8 5
11: f g 5 NA