For a set of columns, I wish to detect which column has the max value. If the value between columns is equal, then I would like to use a weight to decide which column to select. I've tried implementing this using case_when() however it does not work. I will show this problem below using an example dataset...
Let's say I have a dataset which includes these three columns (A, B, C) that correspond to the amount of Apple, Bananas or Carrots a child eats per day.
For row I want to log the most consumed food (i.e., column with the highest value). If the value between any column is equal (e.g., 1 apple and 1 banana), then apply the following rank. Apple > Banana > Carrot, whereby if a child eats 1 Apple and 1 Banana, the log will show Apple.
I've tried implementing this in R using by using pairs of if_else statements with case_when(). However, it does not return the correct result. For example, the final row should be classified as Apple, not carrot. I'm not sure what I'm doing wrong. Please provide a Tidyverse solution to this issue and if possible, explain why my approach did not work.
library(tidyverse)
A <- c(1,1,3,3)
B <- c(2,3,1,1)
C <- c(1,1,1,2)
df <- data.frame(A,B,C)
top_food <- df %>%
mutate(highest = case_when(
C > B ~ "carrot", # if carrot > banana
C > A ~ "carrot", # if carrot > apple
B > A ~ "banana", # if banana > apple
B >= A ~ "banana", # if banana >= carrot
A >= B ~ "apple", # if apple >= banana
A >= C ~ "apple" # if apple >= carrot
))
> | A | B | C | HIGHEST | |
> | 1 | 2 | 1 | banana | |
> | 1 | 3 | 1 | banana | |
> | 3 | 1 | 1 | apple | |
> 3 | 1 | 2 | carrot | |
Notes: - this is an example dataset. - I'm open to solutions with different functions but please provide Tidyverse answers as this is how I am learning R. If possible, please explain why my approach using case_when() did not work so I can learn. - It's important to maintain the shape/layout of the dataset so it can be used in software outside of R, so please do not convert to long format.
A quick fix could contain dplyr::rowise and which.max(). The downside to this approach is that it is quite slow. Additionally I am assuming that the order of your columns reflects the rank of fruits (if there is a tie, which.max will return the first value).
A <- c(1,1,3,3,1,1)
B <- c(2,3,1,1,1,2)
C <- c(1,1,1,2,1,2)
df <- data.frame(A,B,C)
labels <- c("apple","banana","carrot")
df %>%
dplyr::rowwise() %>%
dplyr::mutate(top=labels[which.max(c(A,B,C))]) %>%
dplyr::ungroup()
Another (and probably better) approach could be using max.col()
df <- data.frame(A,B,C)
labels <- c("apple","banana","carrot")
df %>%
dplyr::mutate(top=labels[max.col(df,ties="first")])
And yet another solution independent of your column order (now I am out :D):
df <- data.frame(A,B,C)
top_food <- df %>%
dplyr::mutate(highest = dplyr::case_when(
C > A & C > B ~ "carrot",
B > A & B >= C ~ "banana",
TRUE ~ "apple"))
your approach with case_when did not work, because case_when stops checking conditions as soon as a true condition is found. As your first condition is C>B ~ "carriot" which is true for your last row (2>1), case_when returned "carrot" and did not check the other conditions.