Using tidyr, how can I create a new column through a group-by and calculation?
For example, if I have this dataframe:
name <- c("a", "a", "a", "a", "b", "b", "b", "b")
x1 <- c(0, 0, 0, 0, 1, 1, 1, 1)
x2 <- c(15, 15, 15, 15, 15, 15, 15, 15)
y <- c(1, 2, 1, 2, 1, 2, 1, 2)
z <- c(50, 100, 40, 90, 65, 95, 40, 95)
df <- data.frame(name, x1, x2, y, z)
Let's say I want to (1) group-by x1
and x2
; (2) find the max z
value in that group; and (3) create a new column z2
that normalized z
by that maximum.
So in this case, the expected output for z2
is c(0.5, 1, 0.4, 0.9, 0.684, 1, 0.421, 1).
We could simply group by 'x1', 'x2' and create the column with mutate
library(dplyr)
df <- df %>%
group_by(x1, x2) %>%
mutate(z2 = (z/max(z, na.rm = TRUE))) %>%
ungroup
-output
df
# A tibble: 8 × 6
name x1 x2 y z z2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 0 15 1 50 0.5
2 a 0 15 2 100 1
3 a 0 15 1 40 0.4
4 a 0 15 2 90 0.9
5 b 1 15 1 65 0.684
6 b 1 15 2 95 1
7 b 1 15 1 40 0.421
8 b 1 15 2 95 1