Search code examples
rdplyrlongitudinal

New variables using id's linking different cases in longidutinal data


I have following example data:

id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)
id2 <- c(-1, -1, -1, 1, 1, 1, -1, -1, -1, 3, 3, 3)
x <- c(4, 5, 5, 1, 1, 1, 7, NA, 9, 2, 2, 2)
dat1 <- data.frame(id, id2, x)
dat1

> dat1
   id id2 x
1   1  -1 4
2   1  -1 5
3   1  -1 5
4   2   1 1
5   2   1 1
6   2   1 1
7   3  -1 7
8   3  -1 NA
9   3  -1 9
10  4   3 2
11  4   3 2
12  4   3 2

"id" refers to three different persons in three points of time. "id2" is a link between the different persons; person 1 (id == 1) has no link (id2 == -1), person 2 (id == 2) refers to person 1 (id2 == 1), person 3 (id == 3) has no link (id3 == -1) and person 4 refers to person 3 (id2 == 3).

Now I want to create a new variable "y" with this rule: For a person (id 1, 2 or 3), "y" should always be the value for "x" of another person, which can be identified via "id2". That means: "y" is c(4, 5, 5) for the person with "id" == 2 and "y" is c(7, NA, 9) for the person with "id" == 4. Since there is no match for the persons with "id" == 1 and 3, "y" will be "-1" in this cases (it could also be "NA", redundant). Afterwards, dat1 should look like this:

   id id2 x  y
1   1  -1 4 -1
2   1  -1 5 -1
3   1  -1 5 -1
4   2   1 1  4
5   2   1 1  5
6   2   1 1  5
7   3  -1 7 -1
8   3  -1 NA-1
9   3  -1 9 -1
10  4   3 2  7
11  4   3 2  NA
12  4   3 2  9

How to do this with dplyr?


Solution

  • You can use a self-left_join plus some little adjustments.

    library(dplyr)
    
    # add row specification for each id
    dat1 <- dat1 %>% group_by(id) %>% mutate(row = row_number()) %>% ungroup()
    
    
    dat1 %>% 
      left_join(select(dat1, row, id2 = id, y = x), by = c("row", "id2")) %>% 
      select(-row) %>% 
      mutate(y = if_else(id2 == -1, -1, y))
    
    #> # A tibble: 12 x 4
    #>       id   id2     x     y
    #>    <dbl> <dbl> <dbl> <dbl>
    #>  1     1    -1     4    -1
    #>  2     1    -1     5    -1
    #>  3     1    -1     5    -1
    #>  4     2     1     1     4
    #>  5     2     1     1     5
    #>  6     2     1     1     5
    #>  7     3    -1     7    -1
    #>  8     3    -1    NA    -1
    #>  9     3    -1     9    -1
    #> 10     4     3     2     7
    #> 11     4     3     2    NA
    #> 12     4     3     2     9