Search code examples
rdplyrgroup-bymutate

R New column with repeating value from another column based on min or max of reference column


The answer for this question "R Create a column containing value of another column based on min or max of reference column" does what I want but it's not working when the reference column has more than a ones digit or is a date.

library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                 Value = c(34, 54, 17, 2, 25, 89, 23, 93, 90),
                 Int1 = rep(1:3, 3),
                 Int2 = rep(11:13, 3),
                 Dbl1  = rep(c(1.1, 2.2, 3.3), 3),
                 Dbl2 = rep(c(11.1, 12.2, 13.3), 3),
                 Date = as.Date(c("2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03")))

df
  ID Value Int1 Int2 Dbl1 Dbl2       Date
1  1    34    1   11  1.1 11.1 2001-01-01
2  1    54    2   12  2.2 12.2 2002-02-02
3  1    17    3   13  3.3 13.3 2003-03-03
4  2     2    1   11  1.1 11.1 2001-01-01
5  2    25    2   12  2.2 12.2 2002-02-02
6  2    89    3   13  3.3 13.3 2003-03-03
7  3    23    1   11  1.1 11.1 2001-01-01
8  3    93    2   12  2.2 12.2 2002-02-02
9  3    90    3   13  3.3 13.3 2003-03-03


df %>%
  group_by(ID) %>%
  mutate(.Int1 = Value[min(Int1)],
         .Int2 = Value[min(Int2)],
         .Dbl1 = Value[min(Dbl1)],
         .Dbl2 = Value[min(Dbl2)],
         .Date = Value[min(Date)]) %>% 
  ungroup

    # A tibble: 9 × 12
     ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
  <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    34     1    11   1.1  11.1 2001-01-01    34    NA    34    NA    NA
2     1    54     2    12   2.2  12.2 2002-02-02    34    NA    34    NA    NA
3     1    17     3    13   3.3  13.3 2003-03-03    34    NA    34    NA    NA
4     2     2     1    11   1.1  11.1 2001-01-01     2    NA     2    NA    NA
5     2    25     2    12   2.2  12.2 2002-02-02     2    NA     2    NA    NA
6     2    89     3    13   3.3  13.3 2003-03-03     2    NA     2    NA    NA
7     3    23     1    11   1.1  11.1 2001-01-01    23    NA    23    NA    NA
8     3    93     2    12   2.2  12.2 2002-02-02    23    NA    23    NA    NA
9     3    90     3    13   3.3  13.3 2003-03-03    23    NA    23    NA    NA

What do I need to do so the results in .Int2, .Dbl2, and .Date are not NA and the same as .Int1 and .Dbl1?


Solution

  • It is not the minimum we need for indexing, it would be the position index with which.min, otherwise, min(Int2) for ID 1 will be 11, but there are only 3 rows for ID 1.

    df %>%
      group_by(ID) %>%
      mutate(.Int1 = Value[which.min(Int1)],
             .Int2 = Value[which.min(Int2)],
             .Dbl1 = Value[which.min(Dbl1)],
             .Dbl2 = Value[which.min(Dbl2)],
             .Date = Value[which.min(Date)]) %>% 
      ungroup
    

    -output

    # A tibble: 9 × 12
         ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
      <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
    1     1    34     1    11   1.1  11.1 2001-01-01    34    34    34    34    34
    2     1    54     2    12   2.2  12.2 2002-02-02    34    34    34    34    34
    3     1    17     3    13   3.3  13.3 2003-03-03    34    34    34    34    34
    4     2     2     1    11   1.1  11.1 2001-01-01     2     2     2     2     2
    5     2    25     2    12   2.2  12.2 2002-02-02     2     2     2     2     2
    6     2    89     3    13   3.3  13.3 2003-03-03     2     2     2     2     2
    7     3    23     1    11   1.1  11.1 2001-01-01    23    23    23    23    23
    8     3    93     2    12   2.2  12.2 2002-02-02    23    23    23    23    23
    9     3    90     3    13   3.3  13.3 2003-03-03    23    23    23    23    23
    

    The repetition of code can be simplified if we use across

    df %>% 
      group_by(ID) %>% 
      mutate(across(-Value, ~   Value[which.min(.x)], 
          .names = ".{.col}")) %>%
      ungroup
    

    -output

    # A tibble: 9 × 12
         ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
      <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
    1     1    34     1    11   1.1  11.1 2001-01-01    34    34    34    34    34
    2     1    54     2    12   2.2  12.2 2002-02-02    34    34    34    34    34
    3     1    17     3    13   3.3  13.3 2003-03-03    34    34    34    34    34
    4     2     2     1    11   1.1  11.1 2001-01-01     2     2     2     2     2
    5     2    25     2    12   2.2  12.2 2002-02-02     2     2     2     2     2
    6     2    89     3    13   3.3  13.3 2003-03-03     2     2     2     2     2
    7     3    23     1    11   1.1  11.1 2001-01-01    23    23    23    23    23
    8     3    93     2    12   2.2  12.2 2002-02-02    23    23    23    23    23
    9     3    90     3    13   3.3  13.3 2003-03-03    23    23    23    23    23