Search code examples
rtidyverse

How to select the max value of each row (not all columns) and mutate 2 columns which are the max value and name in R?


This is the original dataframe. I want to know the max value between 'a' and 'b', and mutate max value and variable name as two columns.

df <- data.frame(lon = c(102,103,104,105),
                 lat = c(31,32,33,34),
                 a = c(4,3,7,6),
                 b = c(5,2,4,9))

The target dataframe is like this.

dftarget <- data.frame(lon = c(102,103,104,105),
                       lat = c(31,32,33,34),
                       a = c(4,3,7,6),
                       b = c(5,2,4,9),
                       max = c(5,3,7,9),
                       type = c('b','a','a','b'))

Solution

  • Method 1

    Simply use pmax and max.col function to identify the maximum values and columns.

    library(dplyr)
    
    df %>% mutate(max = pmax(a,b), type = colnames(df)[max.col(df[,3:4]) + 2 ])
    

    Method 2

    Or first re-shape your data to a "long" format for easier manipulation. Then use mutate to extract max values and names. Finally change it back to a "wide" format and relocate columns according to your target.

    df %>% 
      pivot_longer(a:b, names_to = "colname") %>% 
      group_by(lon, lat) %>% 
      mutate(max = max(value), 
             type = colname[which.max(value)]) %>% 
      pivot_wider(everything(), names_from = "colname", values_from = "value") %>% 
      relocate(max, type, .after = b)
    

    Output

    # A tibble: 4 × 6
    # Groups:   lon, lat [4]
        lon   lat     a     b   max type 
      <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
    1   102    31     4     5     5 b    
    2   103    32     3     2     3 a    
    3   104    33     7     4     7 a    
    4   105    34     6     9     9 b