Search code examples
rr-factor

Factor unknown levels at runtime while setting labels for known levels at design time


Consider a simple function that factors and labels a vector (with unordered levels):

  my.factor <- function(data){
    levels = c("d1", "d2", "d3")
    labels = c("Data 1", "Data 2", "Data 3")
    factored.data = factor(data, levels, labels)
    factored.data
  }

This works well for known levels. But suppose an unknown level is added in the future and we run our function:

data = c("d1", "d2", "d3", "d1", "d100")
my.factor(data)

The output will be:

# [1] Data 1 Data 2 Data 3 Data 1 <NA>  
# Levels: Data 1 Data 2 Data 3

However, I want the new, unknown value to be included as a level. That is, I want the output to resemble:

# [1] Data 1 Data 2 Data 3 Data 1 d100
# Levels: Data 1 Data 2 Data 3 d100

Is there a way to set labels for known levels at design time, while still including new, unknown levels that may be passed to my code at runtime?


Solution

  • You could do

      my.factor <- function(data){
        levels <- c("d1", "d2", "d3")
        labels <- c("Data 1", "Data 2", "Data 3")
        nlevels <- setdiff(unique(data), levels)
        levels<-c(levels, nlevels)
        labels <-c(labels, nlevels)
        factored.data = factor(data, levels, labels)
        factored.data
      }
    

    which gives

    data = c("d1", "d2", "d3", "d1", "d100")
    my.factor(data)
    # [1] Data 1 Data 2 Data 3 Data 1 d100  
    # Levels: Data 1 Data 2 Data 3 d100