Search code examples
python-3.xmachine-learningscikit-learnmodelpycaret

What is the source of ValueError in pycaret and how to solve it?


I have a dataframe comprised of 11 columns one of which, Weekday, is categorical.

columns = ['Series' 'Year' 'Month' 'Day' 'Weekday' 'Number1' 'Number2' 'Number3'
 'Number4' 'Number5' 'Number6']

Since data originally was given as a timeseries dataframe, I am using the following approach (Time Series Forecasting Tutorial) to forecast the values of six of the numerical columns utilizing pycaret python package.

However, during model determination and comparison defined in the following function:

def ml_modelling(train, test) -> None:
    """This function models the given timeseries dataset

    Args:
        train (pd.DataFrame): _description_
        test (pd.DataFrame): _description_
    """
    # Now that we have done the train-test-split, we are ready to train a
    # machine learning model on the train data, score it on the test data and
    # evaluate the performance of our model. In this example, I will use
    # PyCaret; an open-source, low-code machine learning library in Python that
    # automates machine learning workflows.
    numerical_columns = list(train.select_dtypes(include=[np.number]).columns.values)
    targets = [col for col in numerical_columns if col.startswith('Number')]
    for target_var in targets:

        s = setup(data=train,
                  test_data=test,
                  target=target_var,
                  fold_strategy='timeseries',
                  numeric_features=numerical_columns,
                  fold=5,
                  transform_target=True,
                  session_id=123)

        models()

        # Now to train machine learning models, you just need to run one line
        best = compare_models(sort='MAE')
        print(f'Output from compare_models for column {target_var}: \n', best)
        print('##############################################################')

I am receiving the following error message:

Traceback (most recent call last):
  File "c:/Users/username/OneDrive/Desktop/project/main_script.py", line 64, in <module>
    main()
  File "c:/Users/username/OneDrive/Desktop/project/main_script.py", line 56, in main
    ml_modelling(train, test)
  File "c:\Users\username\OneDrive\Desktop\project\utilities.py", line 1076, in ml_modelling
    s = setup(data=train,
  File "C:\Users\username\Anaconda3\lib\site-packages\pycaret\regression.py", line 571, in setup
    return pycaret.internal.tabular.setup(
  File "C:\Users\username\Anaconda3\lib\site-packages\pycaret\internal\tabular.py", line 607, in setup        
    raise ValueError(
ValueError: Column type forced is either target column or doesn't exist in the dataset. 

I would appreciate if you let me know what mistakes I am making.


Solution

  • I have to make the following change inside setup function:

    numeric_features=numerical_columns
    

    to

    numeric_features=[col for col in numerical_columns if col != target_var]
    

    since, in any given iteration over target variables, the target cannot be considered a numeric_features anymore.