Consider a simple function that factors and labels a vector (with unordered levels):
my.factor <- function(data){
levels = c("d1", "d2", "d3")
labels = c("Data 1", "Data 2", "Data 3")
factored.data = factor(data, levels, labels)
factored.data
}
This works well for known levels. But suppose an unknown level is added in the future and we run our function:
data = c("d1", "d2", "d3", "d1", "d100")
my.factor(data)
The output will be:
# [1] Data 1 Data 2 Data 3 Data 1 <NA>
# Levels: Data 1 Data 2 Data 3
However, I want the new, unknown value to be included as a level. That is, I want the output to resemble:
# [1] Data 1 Data 2 Data 3 Data 1 d100
# Levels: Data 1 Data 2 Data 3 d100
Is there a way to set labels for known levels at design time, while still including new, unknown levels that may be passed to my code at runtime?
You could do
my.factor <- function(data){
levels <- c("d1", "d2", "d3")
labels <- c("Data 1", "Data 2", "Data 3")
nlevels <- setdiff(unique(data), levels)
levels<-c(levels, nlevels)
labels <-c(labels, nlevels)
factored.data = factor(data, levels, labels)
factored.data
}
which gives
data = c("d1", "d2", "d3", "d1", "d100")
my.factor(data)
# [1] Data 1 Data 2 Data 3 Data 1 d100
# Levels: Data 1 Data 2 Data 3 d100