Search code examples
rdataframemaxapplymin

Column name with the min and max values in a dataset in R


I have this data set:

   Year  January February March April   May  June  July August 
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>         
 1 2018     45        51    63    61    79    85    88     85         
 2 2017     51        60    65    69    75    82    86     84          
 3 2016     47        55    61    68    72    84    87     85        
... with 20 more rows     

I would like to get the months corresponding to the min and max of each row, and the difference between the max and min. This is my code for the min and max,

x <- colnames(data)[apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y

However, it is giving me Year as a output for some of the which.min function.

   Year    January February March April May  June  July  August   MaxMonth  MinMonth    Diff
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>          
 1 2018     45        51    63    61    79    85    88     85      July      January    43
 2 2017     51        60    65    69    75    82    86     84      July      Year       35
 3 2016     47        55    61    68    72    84    87     85      July      Year       40
... with 20 more rows 

Solution

  • We can reshape to long format with pivot_longer, do a group by 'Year', get the column name that correspond to max/min of 'value' (with which.max/which.min) and then join with the original data

    library(dplyr)
    library(tidyr)
    df %>% 
        pivot_longer(cols = -1) %>%
        group_by(Year) %>%
        summarise(maxMonth = name[which.max(value)],
               minMonth = name[which.min(value)]) %>%
        left_join(df, .)