Search code examples
rtidyrmutate

tidyr mutate new column based on group by with calculation


Using tidyr, how can I create a new column through a group-by and calculation?

For example, if I have this dataframe:

name <- c("a", "a", "a", "a", "b", "b", "b", "b")
x1 <- c(0, 0, 0, 0, 1, 1, 1, 1)
x2 <- c(15, 15, 15, 15, 15, 15, 15, 15)
y <- c(1, 2, 1, 2, 1, 2, 1, 2)
z <- c(50, 100, 40, 90, 65, 95, 40, 95)

df <- data.frame(name, x1, x2, y, z)

Let's say I want to (1) group-by x1 and x2; (2) find the max z value in that group; and (3) create a new column z2 that normalized z by that maximum.

enter image description here

So in this case, the expected output for z2 is c(0.5, 1, 0.4, 0.9, 0.684, 1, 0.421, 1).


Solution

  • We could simply group by 'x1', 'x2' and create the column with mutate

    library(dplyr)
    df <- df %>%
        group_by(x1, x2) %>%
        mutate(z2 = (z/max(z, na.rm = TRUE))) %>%
        ungroup
    

    -output

    df
    # A tibble: 8 × 6
      name     x1    x2     y     z    z2
      <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 a         0    15     1    50 0.5  
    2 a         0    15     2   100 1    
    3 a         0    15     1    40 0.4  
    4 a         0    15     2    90 0.9  
    5 b         1    15     1    65 0.684
    6 b         1    15     2    95 1    
    7 b         1    15     1    40 0.421
    8 b         1    15     2    95 1