Search code examples
rdataframemutate

Mutating a column in R from a particular row subset


I am trying to create a new column in a dataframe based upon three other columns: a parent column, specific indicator column, and the value of the combined parent-specific indicator.

Given:

  parent specific val
1      a        x  10
2      a        y  11
3      a        z  12
4      b        x  20
5      b        y  21
6      b        z  22
7      c        x  30
8      c        y  31
9      c        z  32

I'm looking to create a new column, say px_val (selecting the x value of each parent), so that the resulting dataframe is:

  parent specific val px_val
1      a        x  10     10
2      a        y  11     10
3      a        z  12     10
4      b        x  20     20
5      b        y  21     20
6      b        z  22     20
7      c        x  30     30
8      c        y  31     30
9      c        z  32     30

Code for test df:

df <- data.frame(
  parent=c('a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'),
  specific=c('x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z'),
  val=c(10, 11, 12, 20, 21, 22, 30, 31, 32)
)

I've thought to maybe iterate over the dataframe, storing the x value of a given parent in a variable and assigning that to each parent. But it feels like there has to be a more elegant solution?


Solution

  • We could do it this way:

    px_val will contain values where specific equals x for each unique parent -> val[specific == 'x']

    .by=... groups only for this mutate, the advantage is that we do not need a ungroup() thereafter:

    library(dplyr) #>= dplyr 1.1.0
    
    df %>%
      mutate(px_val = val[specific == 'x'], .by=parent)
    
      parent specific val px_val
    1      a        x  10     10
    2      a        y  11     10
    3      a        z  12     10
    4      b        x  20     20
    5      b        y  21     20
    6      b        z  22     20
    7      c        x  30     30
    8      c        y  31     30
    9      c        z  32     30