Search code examples
rrandom-forestpmml

R PMML probabilities precision


Using PMML model file to score a random forest. When scoring getting the following output. Is there a way to increase the number of decimal points for probability? (ie. 0.8 to 0.8000 or 0.2 to 0.2000)

library(randomForest)
library(pmml)

iris.rf <- randomForest(Species ~ ., data=iris, ntree=5)
saveXML(pmml(iris.rf), file="irisrf.xml")

this model is saved as PMML file and evaluated to get the following output { "Species" : "setosa", "Predicted_Species" : "setosa", "Probability_setosa" : 0.8, "Probability_versicolor" : 0.2, "Probability_virginica" : 0.0 }


Solution

  • Your RF model contains five decision trees. Class probabilities are calculated by dividing the number of decision trees that voted for a particular class by the total number of decision trees.

    In your example, one decision tree voted for class "versicolor" (1 / 5 = 0.2), and the remaining four decision trees voted for class "setosa" (4 / 5 = 0.8).

    You cannot change the "precision" of the division operator /. Instead, simply pretty-print fractions 1 / 5 and 4 / 5 with as many decimal places as needed in your application code:

    System.out.printf("%.4f", probability);