Using %load_ext cudf.pandas throws AttributeError

I am trying to use cudf.pandas on a notebook on Kaggle and running into a long error message when enabling GPU on GridSearch. The main issue being an AttributeError on a DatFrame.

The code works fine if I remove the %load_ext cudf.pandas directive.


import cudf
print("cuDF version: ", cudf.__version__)

cuDF version:  24.04.01
%load_ext cudf.pandas

import os
import gc
import numpy as np 
import pandas as pd
import joblib

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import average_precision_score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import (accuracy_score, precision_score, 
                             recall_score, f1_score, classification_report)
from xgboost import XGBClassifier
import shap

# Parameters grid for the cross-validation exercise
hyperparam_grid = {
    'n_estimators': [400, 600, 800, 1000, 1200, 1500],
    'learning_rate': [0.05, 0.07, 0.1, 0.13, 0.15],
    'max_depth': [3, 5, 7, 9, 10, 11],

# Create models and run grid search cross-validation
for p in proteins:
    # Instantiate XGBoost model
    model[p] = XGBClassifier(scale_pos_weight=spw[p],
    print('Model', p)
    print('Running grid search cross validation....')
    # Set up the gscv object with 4-fold
    gs_cv = GridSearchCV(estimator=model[p],

The Error Message:

Model sEH
Running grid search cross validation....
Fitting 4 folds for each of 180 candidates, totalling 720 fits
AttributeError: 'DataFrame' object has no attribute '_mgr'


  • As of now (June 2024), joblib does not support cudf.pandas because the loky backend of joblib does not respect the current process's sys.meta_path when spawning new processes (similar issues exist for the multiprocessing backend, if I recall correctly). Here is a link with further discussion:

    This PR should help fix the problem in joblib, but it has stalled:

    Try changing n_jobs=-1 to n_jobs=None to run a single job. I believe this will work around the joblib failures.