Search code examples
orange

Difficulty reproducing RandomForestLearner results in script rather than GUI


I was hopeful to reproduce some data by calling the functions via script rather than using the orange GUI as it will make batching data easier. I have spent quite a few hours on this to no avail as I have been unable to find documentation for scripting randomforest in orange3.

When using the scheme in the image I get reasonable MSE,RMSE,MAE,R2 for the Random Tree from the trainer and test data sets. The random tree is using default parameters as selected when you place the model.

When I try to script the data sets using the default random tree it commonly returns: "ValueError: Input contains NaN, infinity or a value too large for dtype('float64')." for all values.

Here is the code I am using:

import os
import Orange

cwd = os.getcwd()+'\\'

train = Orange.data.Table(cwd+'train_ex.csv')
test = Orange.data.Table(cwd+'test_ex.csv')
learner = Orange.classification.RandomForestLearner()

result = Orange.evaluation.testing.TestOnTestData(train,test,[learner])

MSE = Orange.evaluation.MSE(result)
RMSE = Orange.evaluation.RMSE(result)
MAE = Orange.evaluation.MAE(result)
R2 = Orange.evaluation.R2(result)
print(MSE,RMSE,MAE,R2)

The datasets can be found at the following link: https://drive.google.com/open?id=1lmNar3jItWmWql7ywZtbTi4PtMC32UwE

Any help on the subject is very appreciated!


Solution

  • I have solved the issue I had run into. In the GUI when selecting the target data the gui will automatically select between classification and regression types of Trees and Forests. If a users target data is numerical (Continuous) the Orange.regression.learner must be called whereas categorical (Discrete) data will use the Orange.classification.learner.