Search code examples
rmachine-learningmlr

Which classes should (binary) factor variables for R package mlr have?


I want to prepare a data set to use it in a Task of the mlr package. Should binary factor independent variables be of class factor, logical, character, or integer? Is it OK to have factor variables with more than 2 classes as factor/character or are there models integrated in mlr which require e.g. a model matrix where mlr doesn't automatically do the conversion? Which classes does mlr expect for those cases?

For example:

x1 <- factor(sample(0:1, size=10, replace = TRUE))
x2 <- factor(sample(letters[1:5], size=10, replace = TRUE))
y <- sample(c("yes", "no"), size=10, replace = TRUE)
library(mlr)
makeClassifTask(data = data.frame(y, x1, x2), target = "y", positive="yes")

Solution

  • Yes. If it's a factor, it should be a factor. You can of course have more than two classes, although not all learners support more than two classes (mlr will take care of determining whether a learner is compatible automatically). mlr always automatically converts everything in a task to be suitable for the learner, or tells you that the learner and task aren't compatible.

    You can also list the learners suitable for a given task with the function listLearners().