Search code examples
pandasscikit-learndatabricksxgboostmlflow

Databricks MLFlow AutoML XGBoost can't predict_proba()


I used AutoML in Databricks Notebooks for a binary classification problem and the winning model flavor was XGBoost (big surprise).

The outputted model is of this variety:

mlflow.pyfunc.loaded_model:
      artifact_path: model
      flavor: mlflow.sklearn
      run_id: 123456789

Any idea why when I use model.predict_proba(X), I get this response?

AttributeError: 'PyFuncModel' object has no attribute 'predict_proba'

I know it is possible to get the probabilities because ROC/AUC is a metric used for tuning the model. Any help would be amazing!


Solution

  • I had the same issue with catboost model. The way I solved it was by saving the artifacts in a local dir

    import os
    from mlflow.tracking import MlflowClient
    client = MlflowClient()
    local_dir = "/dbfs/FileStore/user/models"
    local_path = client.download_artifacts('run_id', "model", local_dir)```
    
    ```model_path = '/dbfs/FileStore/user/models/model/model.cb'
    model = CatBoostClassifier()
    model = model.load_model(model_path)
    model.predict_proba(test_set)```