r machine-learning random-forest categorical-data r-factor

Delete New factor levels not present in the training data

I'm debugging a code with Random Forest package, with barely no previous R experience.

I've reached a point where, excecuting predict.randomForest, I get the error:

New factor levels not present in the training data.

Searching this site I've found the reason and understood that I need to delete the records that are causing the problem.

How can I isolate (find out) which columns/rows are causing the problems?

Solution

Assume you have train.data, which you used to build your model, test.data, which you now want to get predictions for, and your factor variable factor.var1, then you could do:

levels(test.data$factor.var1) %in% levels(train.data$factor.var1)

Which will produce a logical vector corresponding to the factor levels in test.data, with the "FALSE" entries being the factor levels that were not present in your train.data.

Rcpp Rf_warningcall compiler warnings
Modify the name of factor variables in lm function(summary function)
Extreme value analysis and quantile estimation using log Pearson type 3 (Pearson III) distribution - R vs Python
How to hide NAs when using xlsx::saveWorkbook?
How do I retrieve a simple numeric value from a named numeric vector in R?
Matching pair-wise columns from left to right across rows in one dataframe to another dataframe and adding new columns with matching values
Income to outcome flow chart in Sankey plotly R
color mapping in geom_conn_bundle not showing correctly
Print R package startup message AFTER automatic package conflict messages instead of before
Summing a set of R dataframe rows (column-wise), while retaining the first n columns
Added variable / partial regression plots for groups in an interaction?
how to make a topoplot in R with coordinates variable distribution
List of all functions in base R?
Plotting multiple plots for different initial conditions in one graph
Printing repetitively on the same line in R
Generating UI/Server based on initial selection
Subset dataframe based on pickerInput
How to let user pick the data in R-shiny?
Couldn't show my simple bar charts separately on Shiny R dashboardBody
How to programmatically filter contents of a second shiny app displayed via iframe
How to select specific interesting groups for the boxplot in R Shiny app?
Crosstable and Plot grouping with reactive values
Is there a way to make multiple Shiny picker inputs where the selections must be disjoint?
Delay/avoid duplication of shiny server side functions until after credentials
Predictions only returns value "1"
How to display a busy indicator in a shiny app?
Append doesn't work when writing to CSV in R
Changing the start date of a gantt chart in DiagrammeR
Check for installed packages before running install.packages()
Compare two columns element-wise