I am using Orange data mining tool to write a python script to get classification accuracy on test data using a previous saved model(pickle file).
dataFile = "training.csv"
data = Orange.data.Table(dataFile);
learner = Orange.classification.RandomForestLearner()
cf = learner(data)
#save the pickle file
with open("1.pkcls", "wb") as f:
pickle.dump(cf, f)
#load the pickle file
with open("1.pkcls", "rb") as f:
loadCF = pickle.load(f)
testFile = "testing.csv"
test = Orange.data.Table(testFile);
learners = [1]
learners[0] = cf
result = Orange.evaluation.testing.TestOnTestData(data,test,learners)
# get classification accuracy
CAs = Orange.evaluation.CA(result)
I can successfully save and load the model but I had an error
CAs = Orange.evaluation.CA(result)
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 39, in __new__
return self(results, **kwargs)
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 48, in __call__
return self.compute_score(results, **kwargs)
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 84, in compute_score
return self.from_predicted(results, skl_metrics.accuracy_score)
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 75, in from_predicted
dtype=np.float64, count=len(results.predicted))
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 74, in <genexpr>
for predicted in results.predicted),
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 82, in _check_targets
"".format(type_true, type_pred))
ValueError: Can't handle mix of multiclass and continuous
I find a way to fix this problem and successfully generate the classification accuracy by deleting
cf = learner(data)
However, if I delete this line of code, I am unable to train a model and save it because RandomForestLearner does not train the model based on the input file before code of saving and loading model.
with open("1.pkcls", "wb") as f:
pickle.dump(cf, f)
#load the pickle file
with open("1.pkcls", "rb") as f:
loadCF = pickle.load(f)
Does anyone know if it is possible to train a model first and save it as a pickle file. Then I can use it to test another file to get classification accuracy later?
You must not pre-train the classifier before passing it to TestOnTestData
(its name should be TrainOnTrainAndTestOnTestData
, i.e. it invokes fitting/training step on its own).
Unfortunately there is no readily available explicit way to create a Result
instance from an application of a pre-trained classifier(s) on a test dataset.
One quick and dirty way is to thunk the 'learners' passed to TestOnTest data to return the pre-trained models
results = Orange.evaluation.testing.TestOnTestData(data, test, [lambda testdata: loadCF])