I am trying to find the minimum value across two columns for each person in my data. I only want to consider values in one column for the minimum calculation IF a condition (in another column) is met. For example, my data look like this:
df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
age = c(4, 5, 7, 11, 4, 5, 20, 25, 26, 6, 7, 8, 9),
A = c(0.2, 0.3, 0.5, 0.8, 0.2, 0.3, 1, 2, 3, 0.1, 0.3, 2, 3),
B = c(0.1, 0.1, 0.1, 0.1, 0.3, 0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4),
C = c(0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4, 0.4, 0.4))
df
# id age A B C
# 1 4 0.2 0.1 0.1
# 1 5 0.3 0.1 0.1
# 1 7 0.5 0.1 0.1
# 1 11 0.8 0.1 0.1
# 2 4 0.2 0.3 0.2
# 2 5 0.3 0.3 0.2
# 2 20 1.0 0.3 0.2
# 2 25 2.0 0.3 0.2
# 2 26 3.0 0.3 0.2
# 3 6 0.1 0.4 0.4
# 3 7 0.3 0.4 0.4
# 3 8 2.0 0.4 0.4
# 3 9 3.0 0.4 0.4
I want to find the minimum value of columns A and B for each ID. Column B contains the same value repeated whereas column A contains different values. I only want to consider values in column A IF age <= 5. My desired output is in column C.
You'll notice that for person 1, the value in column B is lowest and is returned. For person 2, the value in column A when age = 4 is lowest and is returned. For person 3, the values for column A when age = 6 and age = 7 are lower than the value in column B, BUT because age > 5, I am not considering those values and therefore the value in column B is returned.
I am trying to find a solution with dplyr but am open to anything. I have tried doing this in several steps with group_by and mutate but cannot seem to get anything to work. Many thanks in advance.
library(dplyr)
df |>
mutate(C = min(A[age <= 5], B), .by = id)
How it works
Hopefully this is clear how it works when age <= 5
returns TRUE
. Consider id
when it is 3
:
age <- c(6, 7, 8, 9)
A <- c(0.1, 0.3, 2.0, 3.0)
B <- c(0.4, 0.4, 0.4, 0.4)
A[age <= 5]
numeric(0)
Alone, sub-setting A
by this condition returns an empty set. When you c
ombine it with B
though you get:
c(A[age <= 5], B) # only B values
[1] 0.4 0.4 0.4 0.4
Essentially, when there is an age less than 5, A
and B
values are compared, but when this condition is not met, only B
values are passed to min
.
Output
id age A B C
1 1 4 0.2 0.1 0.1
2 1 5 0.3 0.1 0.1
3 1 7 0.5 0.1 0.1
4 1 11 0.8 0.1 0.1
5 2 4 0.2 0.3 0.2
6 2 5 0.3 0.3 0.2
7 2 20 1.0 0.3 0.2
8 2 25 2.0 0.3 0.2
9 2 26 3.0 0.3 0.2
10 3 6 0.1 0.4 0.4
11 3 7 0.3 0.4 0.4
12 3 8 2.0 0.4 0.4
13 3 9 3.0 0.4 0.4