Search code examples
pythonmachine-learningscikit-learnrandom-forestscikit-learn-pipeline

Get OOB score within a pipeline for Random Forest


I was wondering for a machine learning project: is it possible to implement RandomForestRegressor inside a pipeline?

Specifically, I need to determine the OOB score from a RandomForestRegressor. But my data requires a lot of preprocessing.

I tried several things, and this is the closest so far:

# Creation of the pipeline 

rand_piped = Pipeline([
    ('preprocessor', preprocessor),
    ('model', RandomForestRegressor(max_depth=3, random_state=0, oob_score=True))
    ])

# Fitting our model

rand_piped.fit(df_X_train,df_Y_train.values.ravel())

# Getting our metrics and predictions 

oob_score = rand_piped.oob_score_

At the moment I think my problem is that I still have an unclear idea of this method. So feel free to correct me. It returns this error:

Traceback (most recent call last):
  File "/home/user/my_rf.py", line 15, in <module>
    oob_score = rand_piped.oob_score_
AttributeError: 'Pipeline' object has no attribute 'oob_score_'

Solution

  • Pipelines are subscriptable, so you can look up the oob_score_ in the model step:

    >>> rand_piped["model"].oob_score_
    0.9297212997034854