Search code examples
rclassificationvectorization

Is there an elegant way to classify integers (e.g. ages) into intervals (e.g. age groups) in R?


I need a function that takes two arguments:

  1. a vector of integers
  2. a vector of intervals (string "upper-lower", e.g. "1-2")

For every give age the function should return corresponding age group.

I quickly came up with this function with two nested loops, and it works:

classifyAge <- function(ages, intervals) {
  result <- character(length(ages))
  
  for (i in seq_along(ages)) {
    for (j in seq_along(intervals)) {
      range <- as.numeric(strsplit(intervals[j], "-")[[1]])
      
      if (ages[i] >= range[1] & ages[i] <= range[2]) {
        result[i] <- intervals[j]
        break
      }
    }
  }
  
  return(result)
}

result <- classifyAge(c(1, 2, 3, 5, 5, 7,0), c("1-2", "3-4", "5-Inf"))
print(result)

[1] "1-2"   "1-2"   "3-4"   "5-Inf" "5-Inf" "5-Inf" "" 

I was just wondering whether the same functionality could be achieved using vectorized functions somehow?

I am aware of "cut" function, but I did not have success with it.


Solution

  • cut is recommended and preferred.

    vec <- c(1, 2, 3, 5, 5, 7,0)
    bins <- c(0, 2, 4, Inf)
    cut(vec, bins, labels = paste(bins[-length(bins)]+1, bins[-1], sep="-"))
    # [1] 1-2   1-2   3-4   5-Inf 5-Inf 5-Inf <NA> 
    # Levels: 1-2 3-4 5-Inf
    

    If you need to determine bins based on a user-provided string of integer-contiguous ranges, then perhaps

    txt <- c("1-2", "3-4", "5-Inf")
    bins <- c(0, as.numeric(sub(".*-", "", txt)))
    bins
    # [1]   0   2   4 Inf