Consider the following code, used to identify an age range:
ages <- c(5,10,15,20,25,30,35,40,45,50,55,60)
get_age_group <- function(age) {
label <- ifelse(between(age, 0, 12), "Kid",
ifelse(between(age, 13, 19), "Teenager",
ifelse(between(age, 20, 28), "Young Adult",
"Older Than That")
)
)
label
}
get_age_group(ages)
It works fine...
[1] "Kid" "Kid" "Teenager" "Older Than That" "Older Than That" "Older Than That" "Older Than That"
[8] "Older Than That" "Older Than That" "Older Than That" "Older Than That" "Older Than That"
But that's a nested ifelse
statement from hell, and I plan to extended this to accommodate ages into the 80s.
This is not my first version. The earlier function looked like this...
get_age_group <- function(age) {
label <- ""
if (between(age, 0, 12)) { label <- "Kid" }
else if (between(age, 13, 19)) { label <- "Teenager" }
... # And so on
label
}
That works fine for a single integer, such as get_age_group(10)
, but I found out the hard way that if
statements cannot evaluate conditions with more than one element (i.e., vectors), which raises the dreaded the condition has length > 1
error.
My question is: is there a more elegant solution than nested ifelse
statements?
The cut
function is designed for this.
ages <- c(5,10,15,20,25,30,35,40,45,50,55,60)
get_age_group <- function(age) {
cut(age, c(0, 13, 20, 29, Inf),
labels = c('Kid', 'Teenager', 'Young Adult', 'Older Than That'),
right = FALSE)
}
get_age_group(ages)
This gives:
[1] Kid Kid Teenager Young Adult Young Adult Older Than That
[7] Older Than That Older Than That Older Than That Older Than That Older Than That Older Than That
Levels: Kid Teenager Young Adult Older Than That
Note this returns a factor. If you don't want that, you can add as.character
to convert it.