Search code examples
pythonpydanticpydantic-settings

Pydantic BaseSettings validation issues resulting from an upgrade from v1.x to v2.x


I am currently in the process of updating some of my projects to Pydantic V2, although I am not very familiar with how V2 should work. In one of these projects, the aim is to train a machine learning model using Airflow and MLFlow. This is my train_config.py code:

from pathlib import Path
from typing import Any, Dict, List

import yaml
from pydantic_settings import BaseSettings


def yaml_config_settings_source(settings: BaseSettings) -> Dict[str, Any]:
    with open(Path(__file__).parent / "train_config.yaml") as f:
        return yaml.load(f, Loader=yaml.FullLoader)


class BaseModelConfig(BaseSettings):
    split_test_size: float


class XGBoostConfig(BaseSettings):
    max_depth: int
    learning_rate: float
    n_estimators: int

class MLFlowConfig(BaseSettings):
    mlflow_experiment: str
    mlflow_model_name: str


class Features(BaseSettings):
    target: str
    categorical: List[str]
    numerical: List[str]
    binary: List[str]

    @property
    def all_features(self):
        return self.numerical + self.binary + self.categorical


class Classes(BaseSettings):
    bins: List[float]
    labels: List[int]


class Settings(BaseSettings):
    base_model_config: BaseModelConfig
    xgboost_config: XGBoostConfig
    mlflow_config: MLFlowConfig
    features: Features
    classes: Classes

    class Config:
        @classmethod
        def customise_sources(
            cls,
            init_settings,
            env_settings,
            file_secret_settings,
        ):
            return (
                init_settings,
                env_settings,
                yaml_config_settings_source,
                file_secret_settings,
            )


settings = Settings()

I also have a train_config.yaml file where I have the parameters values for each class. Example:

base_model_config:
  split_test_size: 0.2

xgboost_config:
  max_depth: 5
  reg_alpha: 1.0
...

When I run my main in train.py I get the following Pydantic validation errors:

Traceback (most recent call last):
  File "GitHub/ml-fct-api/src/fct/train/train_config/train_config.py", line 73, in <module>
    settings = Settings()
               ^^^^^^^^^^
  File ".pyenv/versions/ml_fct_api_py311/lib/python3.11/site-packages/pydantic_settings/main.py", line 71, in __init__
    super().__init__(
  File "/.pyenv/versions/ml_fct_api_py311/lib/python3.11/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 5 validation errors for Settings
base_model_config
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
xgboost_config
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
mlflow_config
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
features
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
classes
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

When I run the same code with Pydantic v1.10 I don't get any errors. Could anyone help me understand what I need to change in my code to make sure it works with Pydantic V2? Thanks

I tried adding the data types to the class settings, but it did not work.

class Settings(BaseSettings):
    base_model_config: dict = BaseModelConfig
    xgboost_config: dict = XGBoostConfig
    mlflow_config: dict = MLFlowConfig
    features: dict = Features
    classes: dict = Classes

Solution

  • I had a similar problem updating to V2 and it seems that the way to go now is to use settings_customise_sources of BaseSettings. That means changing from:

        class Config:
            @classmethod
            def customise_sources(
                cls,
                init_settings,
    

    to

        @classmethod
        def settings_customise_sources(
            cls,
            init_settings,
    
    

    so get rid of class Config, deindent and adjust customise function name