I'm trying to use a custom classifier with SciKit-Learn's BaggingClassifier
, and I'm getting an error which I cannot determine the source of. My classifier object passes check_estimator()
, and I have no issue with the fit()
function:
model = ensemble.BaggingClassifier(customEstimator, max_samples=1/n_estimators, n_estimators=n_estimators)
model.fit(trainfeat, trainlabels)
model.predict(testfeat)
This yields the below error trace. The base estimator itself makes binary predictions, via sigmoid threshold. I know that these values must correspond to the test data, but I don't understand what the three operators are supposed to be? And further, this seems like the error is coming from BaggingClassifier
, but the issue must be from me, no?
I'm trying to avoid pasting the code for my entire estimator, but it inherits BaseEstimator
and I only write/overload the functions: fit
, predict
, predict_proba
. Am I missing something in this regard?
I've tried reshaping the features/labels to no avail, didn't even alter the error. I also attempted to have my estimator inherit ClassifierMixin
but that ended up giving me a slew of new issues.
File "Main_File.py", line 76, in <module>
model.predict(testfeat)
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 310, in predict
indices.extend(np.where(_predict_binary(e, X) > thresh)[0])
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 98, in _predict_binary
score = estimator.predict_proba(X)[:, 1]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 698, in predict_proba
for i in range(n_jobs))
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 1003, in __call__
if self.dispatch_one_batch(iterator):
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 834, in dispatch_one_batch
self._dispatch(tasks)
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 753, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 201, in apply_async
result = ImmediateResult(func)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 582, in __init__
self.results = batch()
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 129, in _parallel_predict_proba
proba += proba_estimator
ValueError: operands could not be broadcast together with shapes (100000,2) (100000,) (100000,2)
I guess the problem arises from the output of predict_proba
of your customEstimator
.
Looks like your current implementation return output with a dimension (n_samples, 1)
, which is not compatible. Make sure your predict_proba
output's dimension is (n_samples, 2)
for binary classification problem.