Search code examples
rdataframemaxabsoluterowwise

Find the maximum absolute value by row in an R data frame


I am hoping to find a vectorized approach to get the absolute maximum value from multiple columns in a data frame.

Basically is there an equivalent to the pmax function for getting absolute maximums.

test_df <- tibble(
  some_identifier = c("apple", "tunafish", "turkey_sandwich"), 
  val_a =  c(-1, 2, 0), 
  val_b = c(-3, 3, NA), 
  val_c = c(2, 3, 1)

)

# this is what abs_max column should be 
test_df$abs_max <- c(-3, 3, 1)
test_df

# A tibble: 3 x 5
  some_identifier val_a val_b val_c abs_max
  <chr>           <dbl> <dbl> <dbl>   <dbl>
1 apple              -1    -3     2      -3
2 tunafish            2     3     3       3
3 turkey_sandwich     0    NA     1       1

The abs_max column is what I want to create. A less than optimal solution may be to loop through each row; but wanted to reach out to identify possible a better method.


Solution

  • Here is a way using max.col - thanks to @Gregor

    f <- function(data) {
      tmp <- Filter(is.numeric, data)
      if(inherits(data, "tbl_df")) {
        tmp <- as.matrix(tmp)
      }
      tmp[cbind(1:nrow(tmp),
                max.col(replace(x <- abs(tmp), is.na(x), -Inf)))]
    }
    
    f(test_df)
    # [1] -3  3  1
    

    step by step

    What we do is filter for numeric columns in the first step

    Filter(is.numeric, test_df)
    #  val_a val_b val_c
    #1    -1    -3     2
    #2     2     3     3
    #3     0    NA     1
    

    (called tmp in the function above)

    Then

    replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf))
    

    returns

    #  val_a val_b val_c
    #1     1     3     2
    #2     2     3     3
    #3     0  -Inf     1
    

    that is a data.frame where NAs were replaced with -Inf and all negative values were replaced with their absolute value.

    max.col returns the column position of the maximum values for each row

    max.col(replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf))
    # [1] 2 2 3
    

    This information is finally being used to extract the desired values from Filter(is.numeric, test_df) using a numeric matrix, i.e.

    cbind(1:nrow(Filter(is.numeric, test_df)),
          max.col(replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf)))
    #     [,1] [,2]
    #[1,]    1    2
    #[2,]    2    2
    #[3,]    3    3
    

    data

    test_df <- data.frame(
      some_identifier = c("apple", "tunafish", "turkey_sandwich"), 
      val_a =  c(-1, 2, 0), 
      val_b = c(-3, 3, NA), 
      val_c = c(2, 3, 1), stringsAsFactors = FALSE)