Search code examples
rrandom-forestpmml

R randomForest to PMML class index is wrong


I'm exporting an R randomForest model to PMML. The resulting PMML always has the class as the first element of the DataDictionary element, which is not always true.

Is there some way to fix this or at least increment the PMML with custom Extension elements? That way I could put the class index there.

I've looked in the pmml package documentation, as well as in the pmmlTransformations packages, but couldn't find anything there that could help me solve this issue.


Solution

  • By PMML class I assume you mean the model type (classification vs regression) in the PMML model attributes? If so, it is not true that the model type is determined from the data type of the first element of the DataDictionary....these are completely independent. The model type is determined from the model type R thinks it is. The R random forest object determines the type it thinks it is (model$type) and that is the model type exported by the pmml function. If you want your model to be a certain type, just make sure you let R know that...for example, if you are using the iris data set, if your predicted variable is Sepal.Length, R will correctly assume it is a regression model. If you insist on treating it as a classification model, try using as.factor(Sepal.Length) instead.