Search code examples
pythonscikit-learnsklearn-pandas

Memory error with AdaBoosClassifier


I define AdaBoostClassifier as follows:

adaboost = AdaBoostClassifier(base_estimator=ensemble.RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                                        max_depth=20, max_features=300, max_leaf_nodes=None,
                                        min_samples_leaf=1, min_samples_split=6,
                                        min_weight_fraction_leaf=0.0, n_estimators=580, n_jobs=1),
                                  algorithm='SAMME.R',
                                  n_estimators=20,
                                  learning_rate=1.0)

ada = adaboost.fit(X, y)

The last line of code (where I fit the model) triggers a MemoryError. Why does it happen and how to solve this issue?


Solution

  • Your system is trying to allocate more memory than you have available.

    This is somewhat expected as you are using AdaBoost with a very complex base learner: a Random Forest of 580 trees. Use less complex base model like a low-depth decision tree.

    From the sklearn AdaBoost docs (bold is mine):

    The core principle of AdaBoost is to fit a sequence of weak learners (i.e., models that are only slightly better than random guessing, such as small decision trees)