Search code examples
rdplyrtidyversedata-cleaning

R Generating values based on comparison of previous columns


I'm hoping to generate a column (Min) that finds the minimum of selected name columns and extract the name of the column as its value. Following is the sample dataframe:

          Amy  Abe  Donna  Racheal  Mike     Min       u
          5    34    54     56       23      Amy       0
          43   11    3      33       21      Donna     1
          54   32    21     54       1       Mike      1 
          21   5     43     32       21      Abe       1
          32   21    23     5        32      Racheal   0
          43   2     2      13       45      Abe Donna 1
                            .
                            .
                            .

Column u is just a column that is at the end of the dataset. The dataset is rather large therefore I'm trying to find an efficient method to generate column Min.

The code I have on mind:

     MinData <- Data %>% mutate(Min = 
     min(colnames(Data)[1:5]))

This only extracts name of the columns. What should I add to enable the column to compare values in each row and select the column name with the minimum value?


Solution

  • I'd use the apply function with which :)

    Set up our vector of names

    person_names= names(df[,1:5]) #Presumably the column names are the names
    

    The 1:5 is just there in case you have other columns in your dataset you don't want considered for the minimum check.

    Now we can use apply on a custom function which return a name from whichever column has the lowest value for each row.

    df$Min <- apply(df[,1:5], 1, function(x){person_names[which.min(x)]})
    

    Our custom function is as I described already, apply simply applies the function to each column or row of a data-frame or matrix. The second argument 1 indicates rows, if we wanted columns we could change that to a 2.

    which.min just returns the element number of wherever the minimum is. person_names have our names in order, and which.min returns a number which indicates which name has the smallest value.

    You could compress this all down into a one-line solution if you wanted to do away with the person_names variable.

    df$Min <- apply(df[,1:5], 1, function(x){names(df[,1:5])[which.min(x)]})
    

    If you only have the 5 name columns, drop the 1:5, if you have columns wherever, just replace that with a vector of your column names or numbers.

    EDIT: I saw your comment on the other answer. To accommodate for ties, I'll change the custom function so that it checks for all matches with the minimum value of x, then pastes them together with some custom separator. I'll also modify your data so Donna and Racheal tie in the second row.

    df <- read.table(text = 'Amy  Abe  Donna  Racheal  Mike     Min       u
          5    34    54     56       23      Amy       0
           43   11    3      3       21      Donna     1
           54   32    21     54       1       Mike      1 
           21   5     43     32       21      Abe       1
           32   21    23     5        32      Racheal   0', header = T)
    
    person_names <- names(df[,1:5])
    
    df$Min <- apply(df[,1:5], 1, function(x){paste(person_names[x == min(x)], 
    collapse = ", ")})
    
    > df
      Amy Abe Donna Racheal Mike            Min u
    1   5  34    54      56   23            Amy 0
    2  43  11     3       3   21 Donna, Racheal 1
    3  54  32    21      54    1           Mike 1
    4  21   5    43      32   21            Abe 1
    5  32  21    23       5   32        Racheal 0
    

    I've set the collapse argument to ", ", which is the separator I've arbitrarily chosen. You could adjust this to just be a space " ", or a semi-colon, or whatever you wanted.

    Again, that can be compressed to a one line answer by getting rid of the separate line for person_names.