I have following example data:
id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)
id2 <- c(-1, -1, -1, 1, 1, 1, -1, -1, -1, 3, 3, 3)
x <- c(4, 5, 5, 1, 1, 1, 7, NA, 9, 2, 2, 2)
dat1 <- data.frame(id, id2, x)
dat1
> dat1
id id2 x
1 1 -1 4
2 1 -1 5
3 1 -1 5
4 2 1 1
5 2 1 1
6 2 1 1
7 3 -1 7
8 3 -1 NA
9 3 -1 9
10 4 3 2
11 4 3 2
12 4 3 2
"id" refers to three different persons in three points of time. "id2" is a link between the different persons; person 1 (id == 1) has no link (id2 == -1), person 2 (id == 2) refers to person 1 (id2 == 1), person 3 (id == 3) has no link (id3 == -1) and person 4 refers to person 3 (id2 == 3).
Now I want to create a new variable "y" with this rule: For a person (id 1, 2 or 3), "y" should always be the value for "x" of another person, which can be identified via "id2". That means: "y" is c(4, 5, 5) for the person with "id" == 2 and "y" is c(7, NA, 9) for the person with "id" == 4. Since there is no match for the persons with "id" == 1 and 3, "y" will be "-1" in this cases (it could also be "NA", redundant). Afterwards, dat1 should look like this:
id id2 x y
1 1 -1 4 -1
2 1 -1 5 -1
3 1 -1 5 -1
4 2 1 1 4
5 2 1 1 5
6 2 1 1 5
7 3 -1 7 -1
8 3 -1 NA-1
9 3 -1 9 -1
10 4 3 2 7
11 4 3 2 NA
12 4 3 2 9
How to do this with dplyr?
You can use a self-left_join
plus some little adjustments.
library(dplyr)
# add row specification for each id
dat1 <- dat1 %>% group_by(id) %>% mutate(row = row_number()) %>% ungroup()
dat1 %>%
left_join(select(dat1, row, id2 = id, y = x), by = c("row", "id2")) %>%
select(-row) %>%
mutate(y = if_else(id2 == -1, -1, y))
#> # A tibble: 12 x 4
#> id id2 x y
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -1 4 -1
#> 2 1 -1 5 -1
#> 3 1 -1 5 -1
#> 4 2 1 1 4
#> 5 2 1 1 5
#> 6 2 1 1 5
#> 7 3 -1 7 -1
#> 8 3 -1 NA -1
#> 9 3 -1 9 -1
#> 10 4 3 2 7
#> 11 4 3 2 NA
#> 12 4 3 2 9