Search code examples
rdataframedplyr

Create new column combining info from other two columns, regardless of order


I want to create a new column in a data frame that generates a unique value combining info from another two columns, regardless of order.

Example

df = tibble(x = c(1,2,3,3,4,10,9), y=c(2,1,9,9,9,1,3))
df

# A tibble: 7 × 2
      x     y
  <dbl> <dbl>
1     1     2
2     2     1
3     3     9
4     3     9
5     4     9
6    10     1
7     9     3

I want to generate this

# A tibble: 7 × 3
      x     y  type
  <dbl> <dbl> <dbl>
1     1     2     1
2     2     1     1
3     3     9     2
4     3     9     2
5     4     9     3
6    10     1     4
7     9     3     2

How can this be achieved for a general data frame?

EDIT: This is not the same question as those being linked.

The suggested answers results in

> df |>  
+     group_by(x,y) |> 
+     mutate(type = cur_group_id())

# A tibble: 7 × 3
# Groups:   x, y [6]
      x     y  type
  <dbl> <dbl> <int>
1     1     2     1
2     2     1     2
3     3     9     3
4     3     9     3
5     4     9     4
6    10     1     6
7     9     3     5

which is wrong.


Solution

  • For the case with two columns, we can neutralize the ordering by (arbitrarily) putting the two columns in order when determining their group.

    df |>
      mutate(grp = paste(pmin(x,y), pmax(x,y))) |>
      mutate(type = cur_group_id(), .by = grp)
    

    Result

          x     y grp    type
      <dbl> <dbl> <chr> <int>
    1     1     2 1 2       1
    2     2     1 1 2       1
    3     3     9 3 9       2
    4     3     9 3 9       2
    5     4     9 4 9       3
    6    10     1 1 10      4
    7     9     3 3 9       2