The answer for this question "R Create a column containing value of another column based on min or max of reference column" does what I want but it's not working when the reference column has more than a ones digit or is a date.
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Value = c(34, 54, 17, 2, 25, 89, 23, 93, 90),
Int1 = rep(1:3, 3),
Int2 = rep(11:13, 3),
Dbl1 = rep(c(1.1, 2.2, 3.3), 3),
Dbl2 = rep(c(11.1, 12.2, 13.3), 3),
Date = as.Date(c("2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03")))
df
ID Value Int1 Int2 Dbl1 Dbl2 Date
1 1 34 1 11 1.1 11.1 2001-01-01
2 1 54 2 12 2.2 12.2 2002-02-02
3 1 17 3 13 3.3 13.3 2003-03-03
4 2 2 1 11 1.1 11.1 2001-01-01
5 2 25 2 12 2.2 12.2 2002-02-02
6 2 89 3 13 3.3 13.3 2003-03-03
7 3 23 1 11 1.1 11.1 2001-01-01
8 3 93 2 12 2.2 12.2 2002-02-02
9 3 90 3 13 3.3 13.3 2003-03-03
df %>%
group_by(ID) %>%
mutate(.Int1 = Value[min(Int1)],
.Int2 = Value[min(Int2)],
.Dbl1 = Value[min(Dbl1)],
.Dbl2 = Value[min(Dbl2)],
.Date = Value[min(Date)]) %>%
ungroup
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 NA 34 NA NA
2 1 54 2 12 2.2 12.2 2002-02-02 34 NA 34 NA NA
3 1 17 3 13 3.3 13.3 2003-03-03 34 NA 34 NA NA
4 2 2 1 11 1.1 11.1 2001-01-01 2 NA 2 NA NA
5 2 25 2 12 2.2 12.2 2002-02-02 2 NA 2 NA NA
6 2 89 3 13 3.3 13.3 2003-03-03 2 NA 2 NA NA
7 3 23 1 11 1.1 11.1 2001-01-01 23 NA 23 NA NA
8 3 93 2 12 2.2 12.2 2002-02-02 23 NA 23 NA NA
9 3 90 3 13 3.3 13.3 2003-03-03 23 NA 23 NA NA
What do I need to do so the results in .Int2, .Dbl2, and .Date are not NA and the same as .Int1 and .Dbl1?
It is not the min
imum we need for indexing, it would be the position index with which.min
, otherwise, min(Int2)
for ID 1 will be 11
, but there are only 3 rows for ID 1.
df %>%
group_by(ID) %>%
mutate(.Int1 = Value[which.min(Int1)],
.Int2 = Value[which.min(Int2)],
.Dbl1 = Value[which.min(Dbl1)],
.Dbl2 = Value[which.min(Dbl2)],
.Date = Value[which.min(Date)]) %>%
ungroup
-output
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 34 34 34 34
2 1 54 2 12 2.2 12.2 2002-02-02 34 34 34 34 34
3 1 17 3 13 3.3 13.3 2003-03-03 34 34 34 34 34
4 2 2 1 11 1.1 11.1 2001-01-01 2 2 2 2 2
5 2 25 2 12 2.2 12.2 2002-02-02 2 2 2 2 2
6 2 89 3 13 3.3 13.3 2003-03-03 2 2 2 2 2
7 3 23 1 11 1.1 11.1 2001-01-01 23 23 23 23 23
8 3 93 2 12 2.2 12.2 2002-02-02 23 23 23 23 23
9 3 90 3 13 3.3 13.3 2003-03-03 23 23 23 23 23
The repetition of code can be simplified if we use across
df %>%
group_by(ID) %>%
mutate(across(-Value, ~ Value[which.min(.x)],
.names = ".{.col}")) %>%
ungroup
-output
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 34 34 34 34
2 1 54 2 12 2.2 12.2 2002-02-02 34 34 34 34 34
3 1 17 3 13 3.3 13.3 2003-03-03 34 34 34 34 34
4 2 2 1 11 1.1 11.1 2001-01-01 2 2 2 2 2
5 2 25 2 12 2.2 12.2 2002-02-02 2 2 2 2 2
6 2 89 3 13 3.3 13.3 2003-03-03 2 2 2 2 2
7 3 23 1 11 1.1 11.1 2001-01-01 23 23 23 23 23
8 3 93 2 12 2.2 12.2 2002-02-02 23 23 23 23 23
9 3 90 3 13 3.3 13.3 2003-03-03 23 23 23 23 23