Search code examples
rdplyrmaxminmutate

mutate to create new columns with max and mins from other columns by group


Say my dataset has two columns

A = c(1,5,NA,NA,NA)
B = c(NA,NA,10,11,13)

how do I create a new column C that takes the max value from A and the min from B to create a new column:

C= c(NA,5,10,NA,NA)

Solution

  • In this case, if which.max(A) == which.min(B), then A wins ... that can be changed by swapping the order within case_when or reversing the nesting of the base R ifelses.

    dplyr

    library(dplyr)
    quux %>%
      mutate(
        C2 = case_when(
          row_number() == which.max(A) ~ A, 
          row_number() == which.min(B) ~ B, 
          TRUE ~ NA)
      )
    #    A  B  C C2
    # 1  1 NA NA NA
    # 2  5 NA  5  5
    # 3 NA 10 10 10
    # 4 NA 11 NA NA
    # 5 NA 13 NA NA
    

    base R

    (I don't really like nested ifelse, but this is uncomplicated ...)

    with(quux, ifelse(seq_along(A) == which.max(A), A,
                      ifelse(seq_along(A) == which.min(B), B, NA)))
    # [1] NA  5 10 NA NA
    

    Data

    quux <- structure(list(A = c(1, 5, NA, NA, NA), B = c(NA, NA, 10, 11, 13), C = c(NA, 5, 10, NA, NA)), class = "data.frame", row.names = c(NA, -5L))
    

    (P.S.: if you know all numbers will be effectively integer, then you can shorten this a little by replacing seq_along(A) == which.max(A) with A == max(A), etc. The reason I don't start with that is that floating-point equality is imperfect in the presence of high-precision numbers, see Why are these numbers not equal? and https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f.)