Search code examples
rdplyrtidyversedata-management

If columns match, take value from another column and assign it to a new variable


I have a data set that contains a group an id variable and a pair id variable. Each observation is paired with another observation. I want to create a new variable that takes the id variable of its pair-partner and assign it as a new 'partner' variable.

Below I have an example data:

id <- 1:10
pair_id <- rep(1:5,2)

df <- cbind(id, pair_id)

> df
      id pair_id
 [1,]  1       1
 [2,]  2       2
 [3,]  3       3
 [4,]  4       4
 [5,]  5       5
 [6,]  6       1
 [7,]  7       2
 [8,]  8       3
 [9,]  9       4
[10,] 10       5

As I said above, I want to add a variable indicating on each id partner's id, where the partnership is identified with the pair_id. As an example, observations with id == 1 has pair_id == 1, which makes it partner to be the observation with id == 6, since they share pair_id.

So the end result should look like this:

     id pair_id partner_id
 [1,]  1       1          6
 [2,]  2       2          7
 [3,]  3       3          8
 [4,]  4       4          9
 [5,]  5       5         10
 [6,]  6       1          1
 [7,]  7       2          2
 [8,]  8       3          3
 [9,]  9       4          4
[10,] 10       5          5

Thanks!


Solution

  • You can reverse the id values for each pair_id :

    library(dplyr)
    df %>% group_by(pair_id) %>% mutate(partner_id = rev(id))
    
    #      id pair_id partner_id
    #   <int>   <int>      <int>
    # 1     1       1          6
    # 2     2       2          7
    # 3     3       3          8
    # 4     4       4          9
    # 5     5       5         10
    # 6     6       1          1
    # 7     7       2          2
    # 8     8       3          3
    # 9     9       4          4
    #10    10       5          5
    

    The equivalent in base R :

    df$partner_id <- with(df, ave(id, pair_id, FUN = rev))
    

    and in data.table is

    library(data.table)
    setDT(df)[, partner_id := rev(id), pair_id]
    

    data

    df <- data.frame(id, pair_id)