Lets assume a data like this below
ID Quantity Group Indicator
1 0.93 Red 1
2 0.17 Red 1
3 0.01 Red 0
4 0.44 Red 1
5 0.01 Red 0
6 0.86 Red 1
7 0.07 Red 1
8 0.02 Red 0
9 1.00 Red 1
1 0.65 Blue 1
2 0.17 Blue 1
3 0.02 Blue 0
4 0.01 Blue 0
5 0.09 Blue 1
6 0.86 Blue 1
7 0.05 Blue 0
8 0.23 Blue 1
9 0.01 Blue 0
Now I want to create a new column with three values/categories.
This logic to be applied for values in Group separately.
Expecting an output like this.
median of all values in Quantity where Indicator == 1 for Group Red. median(0.93, 0.17,0.44,0.86,0.07,1.00) = 0.65
median of all values in Quantity where Indicator == 1 for Group Blue. median(0.65, 0.17,0.09,0.86,0.23) = 0.23
ID Quantity Group Indicator Results
1 0.93 Red 1 3 <- Above the median(0.65) for all values in red where Indicator ==1
2 0.17 Red 1 2 <- Below the median(0.65) for all values in red where Indicator ==1
3 0.01 Red 0 1
4 0.44 Red 1 2 <- Below the median(0.65) for all values in red where Indicator ==1
5 0.01 Red 0 1
6 0.86 Red 1 3 <- Above the median(0.65) for all values in red where Indicator ==1
7 0.07 Red 1 2 <- Below the median(0.65) for all values in red where Indicator ==1
8 0.02 Red 0 1
9 1.00 Red 1 3 <- Above the median(0.65) for all values in red where Indicator ==1
1 0.65 Blue 1 3
2 0.17 Blue 1 2
3 0.02 Blue 0 1
4 0.01 Blue 0 1
5 0.09 Blue 1 2
6 0.86 Blue 1 3
7 0.05 Blue 0 1
8 0.23 Blue 1 2
9 0.01 Blue 0 1
I have tried this with bunch of ifelse and it is very clumsy. Looking for something efficient with case_when. Thanks in advance.
A dplyr
approach to achieve your desired result may look like so:
library(dplyr, warn = FALSE)
dat |>
mutate(
Results = case_when(
Indicator == 0 ~ 1,
Quantity <= median(Quantity) ~ 2,
.default = 3
),
.by = c(Group, Indicator)
)
#> ID Quantity Group Indicator Results
#> 1 1 0.93 Red 1 3
#> 2 2 0.17 Red 1 2
#> 3 3 0.01 Red 0 1
#> 4 4 0.44 Red 1 2
#> 5 5 0.01 Red 0 1
#> 6 6 0.86 Red 1 3
#> 7 7 0.07 Red 1 2
#> 8 8 0.02 Red 0 1
#> 9 9 1.00 Red 1 3
#> 10 1 0.65 Blue 1 3
#> 11 2 0.17 Blue 1 2
#> 12 3 0.02 Blue 0 1
#> 13 4 0.01 Blue 0 1
#> 14 5 0.09 Blue 1 2
#> 15 6 0.86 Blue 1 3
#> 16 7 0.05 Blue 0 1
#> 17 8 0.23 Blue 1 2
#> 18 9 0.01 Blue 0 1
DATA
dat <- data.frame(
ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9),
Quantity = c(0.93, 0.17, 0.01, 0.44, 0.01, 0.86, 0.07, 0.02, 1.00, 0.65, 0.17, 0.02, 0.01, 0.09, 0.86, 0.05, 0.23, 0.01),
Group = c("Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue"),
Indicator = c(1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0)
)