Search code examples
rmachine-learningrandom-forestcategorical-datar-factor

Delete New factor levels not present in the training data


I'm debugging a code with Random Forest package, with barely no previous R experience.

I've reached a point where, excecuting predict.randomForest, I get the error:

New factor levels not present in the training data.

Searching this site I've found the reason and understood that I need to delete the records that are causing the problem.

How can I isolate (find out) which columns/rows are causing the problems?


Solution

  • Assume you have train.data, which you used to build your model, test.data, which you now want to get predictions for, and your factor variable factor.var1, then you could do:

    levels(test.data$factor.var1) %in% levels(train.data$factor.var1)
    

    Which will produce a logical vector corresponding to the factor levels in test.data, with the "FALSE" entries being the factor levels that were not present in your train.data.