Search code examples
pythonscikit-learnensemble-learning

StackingCVClassifier pre-trained base models


I haven't been able to find any information on whether or not StackingCVClassifiers accept pre-trained models.


Solution

  • Probably not. StackedCVClassifiers and StackingClassifier currently take a list of base estimators, then apply fit and predict on them.

    It's pretty straightforward to implement this though. The main idea behind stacking is to fit a "final model" using the predictions of earlier models.

    import numpy as np
    from sklearn.datasets import make_regression
    from sklearn.model_selection import train_test_split
    
    X, y = make_regression(n_samples=1000)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    

    Here X_train is (750, 100) and X_test is (250, 100).

    We'll emulate "pre-trained" three models fit on X_train, y_train and produce predictions using the training set and the test set:

    from sklearn.linear_model import RidgeCV, LassoCV
    from sklearn.neighbors import KNeighborsRegressor
    
    # Emulate "pre-trained" models
    models = [RidgeCV(), LassoCV(), KNeighborsRegressor(n_neighbors=5)]
    
    X_train_new = np.zeros((X_train.shape[0], len(models)))    # (750, 3)
    X_test_new = np.zeros((X_test.shape[0], len(models)))      # (250, 3)
    
    for i, model in enumerate(models):
        model.fit(X_train, y_train)
        X_train_new[:, i] = model.predict(X_train)
        X_test_new[:, i] = model.predict(X_test)
    

    The final model is fit on X_train_new and can make predictions using (N, 3) matrices produced by our base models:

    from sklearn.ensemble import GradientBoostingRegressor
    
    clf = GradientBoostingRegressor()
    clf.fit(X_train_new, y_train)
    clf.score(X_test_new, y_test)
    # 0.9998247