I am currently in the process of updating some of my projects to Pydantic V2, although I am not very familiar with how V2 should work. In one of these projects, the aim is to train a machine learning model using Airflow and MLFlow. This is my train_config.py
code:
from pathlib import Path
from typing import Any, Dict, List
import yaml
from pydantic_settings import BaseSettings
def yaml_config_settings_source(settings: BaseSettings) -> Dict[str, Any]:
with open(Path(__file__).parent / "train_config.yaml") as f:
return yaml.load(f, Loader=yaml.FullLoader)
class BaseModelConfig(BaseSettings):
split_test_size: float
class XGBoostConfig(BaseSettings):
max_depth: int
learning_rate: float
n_estimators: int
class MLFlowConfig(BaseSettings):
mlflow_experiment: str
mlflow_model_name: str
class Features(BaseSettings):
target: str
categorical: List[str]
numerical: List[str]
binary: List[str]
@property
def all_features(self):
return self.numerical + self.binary + self.categorical
class Classes(BaseSettings):
bins: List[float]
labels: List[int]
class Settings(BaseSettings):
base_model_config: BaseModelConfig
xgboost_config: XGBoostConfig
mlflow_config: MLFlowConfig
features: Features
classes: Classes
class Config:
@classmethod
def customise_sources(
cls,
init_settings,
env_settings,
file_secret_settings,
):
return (
init_settings,
env_settings,
yaml_config_settings_source,
file_secret_settings,
)
settings = Settings()
I also have a train_config.yaml
file where I have the parameters values for each class. Example:
base_model_config:
split_test_size: 0.2
xgboost_config:
max_depth: 5
reg_alpha: 1.0
...
When I run my main in train.py
I get the following Pydantic validation errors:
Traceback (most recent call last):
File "GitHub/ml-fct-api/src/fct/train/train_config/train_config.py", line 73, in <module>
settings = Settings()
^^^^^^^^^^
File ".pyenv/versions/ml_fct_api_py311/lib/python3.11/site-packages/pydantic_settings/main.py", line 71, in __init__
super().__init__(
File "/.pyenv/versions/ml_fct_api_py311/lib/python3.11/site-packages/pydantic/main.py", line 164, in __init__
__pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 5 validation errors for Settings
base_model_config
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/missing
xgboost_config
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/missing
mlflow_config
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/missing
features
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/missing
classes
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/missing
When I run the same code with Pydantic v1.10 I don't get any errors. Could anyone help me understand what I need to change in my code to make sure it works with Pydantic V2? Thanks
I tried adding the data types to the class settings, but it did not work.
class Settings(BaseSettings):
base_model_config: dict = BaseModelConfig
xgboost_config: dict = XGBoostConfig
mlflow_config: dict = MLFlowConfig
features: dict = Features
classes: dict = Classes
I had a similar problem updating to V2 and it seems that the way to go now is to use settings_customise_sources
of BaseSettings. That means changing from:
class Config:
@classmethod
def customise_sources(
cls,
init_settings,
to
@classmethod
def settings_customise_sources(
cls,
init_settings,
so get rid of class Config
, deindent and adjust customise function name