Search code examples
rtidyverse

Way to avoid nested ifelse statement in R


I have a dataframe that includes various ages and a value for each age. I would like to classify the different ages into broader age groups. To do so, I had to create a fairly complicated nested ifelse statement:

library(tidyverse)

df <- data.frame(age = c("15 to 17", "18 and 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59"),
                 value = sample(1000:2000,10, replace=TRUE))

new_df = df %>% 
  mutate(age_band = 
           ifelse(age %in% c("15 to 17","18 and 19"), '15 to 19', ifelse(age %in% c("20 to 24","25 to 29"), '20s', ifelse(age %in% c("30 to 34","35 to 39"), '30s','40+'))))

Is there any way to do this without using such complex nested statements? The data goes all the way to ages 85 and over, and it got very complicated to classify each different age group.


Solution

  • Following @Claudiu Papasteri recommendations:

    Package

    library(dplyr)
    

    Solution

    df %>% 
      mutate(age_band = case_when( age %in% c("15 to 17","18 and 19") ~  '15 to 19',
                                   age %in% c("20 to 24","25 to 29") ~ '20s',
                                   age %in% c("30 to 34","35 to 39") ~ '30s',
                                   TRUE ~'40+')
             )
    

    ps: I would include all specific statements in case_when, even the last one for "40+", this way you can keep track of any problem in your dataset. Because if something is missing, or you have a typo, everything else will be coded as "40+". So by stating every case you can catch the problem and fix everything before running an analysis or generating graphs. And the last statement could be changed to TRUE ~ age. This means, whatever remains, use the same value, or TRUE ~ NA, that you recode the remaining stuff for missing. So you know that all missing means you have a problem you have to fix.

    Output

             age value age_band
    1   15 to 17  1432 15 to 19
    2  18 and 19  1112 15 to 19
    3   20 to 24  1265      20s
    4   25 to 29  1076      20s
    5   30 to 34  1212      30s
    6   35 to 39  1238      30s
    7   40 to 44  1384      40+
    8   45 to 49  1612      40+
    9   50 to 54  1606      40+
    10  55 to 59  1897      40+