Search code examples
rdplyrcategorization

Case_when - Not returning the correct values


This is my first post here and I am quite new in R.

I am having some problems when trying to categorize a variable, waist circumference, using case_when.

data_norm <- data_norm  %>%
  mutate(
    Waist_C_Classification = case_when(
      Sex = 1 & Waist_C < 94 ~ "low",
      Sex = 1 & Waist_C <= 102 ~ "medium",
      Sex = 1 & Waist_C > 102 ~ "high",
      Sex = 2 & Waist_C < 80 ~ "low",
      Sex = 2 & Waist_C <= 88 ~ "medium",      
      Sex = 2 & Waist_C > 88 ~ "high"
    )     
)
Sex    Waist_C  Waist_C_Classification
2   86.00   low     
2   73.00   low     
2   94.00   medium

In this case he last one should be high as it is Sex 2 and more than 88 cm.

I have tried to use == instead of = and to use "Male" and "Female" as the variable is labelled, but I obtained same result.

The idea would be to obtain one variable with the categories per sex.

Thanks!


Solution

  • Like everyone here has mentioned == solves most of the issue. I think the code can also benefit from defining a lower boundary in some of them. For example, you have Sex = 1 & Waist_C < 94 ~ "low" also, Sex = 1 & Waist_C <= 102 ~ "medium". Now anything lower than 94 can fall under either of these categories.

    Try this,

    library(tidyverse)
    library(data.table)
    
    data_norm <- data.frame(Sex = c(1,1,2,2,1,2), Waist_C = c(86, 96 , 104, 94, 73, 88))
    
    
    data_norm <- data_norm  %>%
      mutate(
        Waist_C_Classification = case_when(
          Sex == 1 & Waist_C <= 94 ~ "low",
          Sex == 1 & Waist_C > 94 & Waist_C <= 102 ~ "medium",
          Sex == 1 & Waist_C > 102 ~ "high",
          Sex == 2 & Waist_C <= 80 ~ "low",
          Sex == 2 & Waist_C > 80 & Waist_C <= 88 ~ "medium",      
          Sex == 2 & Waist_C > 88 ~ "high"
        )     
      )
    
    data_norm
    
    

    Result-

    
      Sex Waist_C Waist_C_Classification
       1      86                    low
       1      96                 medium
       2     104                   high
       2      94                   high
       1      73                    low
       2      88                 medium