Search code examples
pythondynamicenumsopenapipydantic

Validate Pydantic dynamic float enum by name with OpenAPI description


Following on from this question and this discussion I am now trying to create a Pydantic BaseModel that has a field with a float Enum that is created dynamically and is validated by name. (Down the track I will probably want to use Decimal but for now I'm dealing with float.)

The discussion provides a solution to convert all Enums to validate by name, but I'm looking for how to do this for one or more individual fields, not a universal change to all Enums.

I consider this to be a common use case. The model uses an Enum which hides implementation details from the caller. The valid field values that a caller can supply are a limited list of names. These names are associated with internal values (in this case float) that the back-end wants to operate on, without requiring the caller to know them.

The Enum valid names and values do change dynamically and are loaded at run time but for the sake of clarity this would result in an Enum something like the following. Note that the Sex enum needs to be treated normally and validated and encoded by value, but the Factor enum needs to be validated by name:

from enum import Enum
from pydantic import BaseModel

class Sex(str, Enum):
    MALE = "M"
    FEMALE = "F"

class Factor(Enum):
    single = 1.0
    half = 0.4
    quarter = 0.1

class Model(BaseModel):
    sex: Sex
    factor: Factor
    class Config:
        json_encoders = {Factor: lambda field: field.name}

model = Model(sex="M", factor="half")
# Error: only accepts e.g. Model(sex="M", factor=0.4)

This is what I want but doesn't work because the normal Pydantic Enum behaviour requires Model(factor=0.4), but my caller doesn't know the particular float that's in use right now for this factor, it can and should only provide "half". The code that manipulates the model internally always wants to refer to the float and so I expect it to have to use model.factor.value.

It's fairly simple to create the Enum dynamically, but that doesn't provide any Pydantic support for validating on name. It's all automatically validated by value. So I think this is where most of the work is:

Factor = Enum("Factor", {"single": 1.0, "half": 0.4, "quarter": 0.1})

The standard way for Pydantic to customise serialization is with the json_encoders Config attribute. I've included that in the sample static Enum. That doesn't seem to be problematic.

Finally, there needs to be support to provide the right description to the OpenAPI schema.

Actually, in my use-case I only need the Enum name/values to be dynamically established. So an implementation that modifies a declared Enum would work, as well as an implementation that creates the Enum type.


Solution

  • Update (2023-03-03)

    Class decorator solution

    A convenient way to solve this is by creating a reusable decorator that adds both a __get_validators__ method and a __modify_schema__ method to any given Enum class. Both of these methods are documented here.

    We can define a custom validator function that will be called for our decorated Enum classes, which will enforce that only names will be turned into members and actual members will pass validation.

    The schema modifier will ensure that the JSON schema only shows the names as enum options.

    from collections.abc import Callable, Iterator
    from enum import EnumMeta
    from typing import Any, Optional, TypeVar, cast
    
    from pydantic.fields import ModelField
    
    E = TypeVar("E", bound=EnumMeta)
    
    def __modify_enum_schema__(
        field_schema: dict[str, Any],
        field: Optional[ModelField],
    ) -> None:
        if field is None:
            return
        field_schema["enum"] = list(cast(EnumMeta, field.type_).__members__.keys())
    
    def __enum_name_validator__(v: Any, field: ModelField) -> Any:
        assert isinstance(field.type_, EnumMeta)
        if isinstance(v, field.type_):
            return v  # value is already an enum member
        try:
            return field.type_[v]  # get enum member by name
        except KeyError:
            raise ValueError(f"Invalid {field.type_.__name__} `{v}`")
    
    def __get_enum_validators__() -> Iterator[Callable[..., Any]]:
        yield __enum_name_validator__
    
    def validate_by_name(cls: E) -> E:
        setattr(cls, "__modify_schema__", __modify_enum_schema__)
        setattr(cls, "__get_validators__", __get_enum_validators__)
        return cls
    

    Usage

    from enum import Enum
    from random import choices, random
    from string import ascii_lowercase
    
    from pydantic import BaseModel
    
    # ... import validate_by_name
    
    
    # Randomly generate an enum of floats:
    _members = {
        name: round(random(), 1)
        for name in choices(ascii_lowercase, k=3)
    }
    Factor = Enum("Factor", _members)  # type: ignore[misc]
    validate_by_name(Factor)
    first_member = next(iter(Factor))
    print("`Factor` members:", Factor.__members__)
    print("First `Factor` member:", first_member)
    
    
    class Foo(Enum):
        member_a = "a"
        member_b = "b"
    
    
    @validate_by_name
    class Bar(int, Enum):
        x = 1
        y = 2
    
    
    class Model(BaseModel):
        factor: Factor
        foo: Foo
        bar: Bar
    
        class Config:
            json_encoders = {Factor: lambda field: field.name}
    
    
    obj = Model.parse_obj({
        "factor": first_member.name,
        "foo": "a",
        "bar": "x",
    })
    print(obj.json(indent=4))
    print(Model.schema_json(indent=4))
    

    Example output:

    `Factor` members: {'r': <Factor.r: 0.1>, 'j': <Factor.j: 0.9>, 'z': <Factor.z: 0.6>}
    First `Factor` member: Factor.r
    
    {
        "factor": "r",
        "foo": "a",
        "bar": 1
    }
    
    {
        "title": "Model",
        "type": "object",
        "properties": {
            "factor": {
                "$ref": "#/definitions/Factor"
            },
            "foo": {
                "$ref": "#/definitions/Foo"
            },
            "bar": {
                "$ref": "#/definitions/Bar"
            }
        },
        "required": [
            "factor",
            "foo",
            "bar"
        ],
        "definitions": {
            "Factor": {
                "title": "Factor",
                "description": "An enumeration.",
                "enum": [
                    "r",
                    "j",
                    "z"
                ]
            },
            "Foo": {
                "title": "Foo",
                "description": "An enumeration.",
                "enum": [
                    "a",
                    "b"
                ]
            },
            "Bar": {
                "title": "Bar",
                "description": "An enumeration.",
                "enum": [
                    "x",
                    "y"
                ],
                "type": "integer"
            }
        }
    }
    

    This just demonstrates a few variations for this approach. As you can see, the Factor and Bar enums are both validated by name, whereas Foo is validated by value (as a regular Enum).

    Since we defined a custom JSON Encoder for Factor, the factor value is exported/encoded as the name string, while both Foo and Bar are exported by value (as a regular Enum).

    Both Factor and Bar display the enum names in their JSON schema, while Foo shows the enum values.

    Note that the "type": "integer" for the JSON Schema of Bar is only present because I specified int as a explicit base class of Bar and disappears, if we remove that. To further ensure consistency, we could of course also simply add "type": "string" inside our __modify_enum_schema__ function.

    The only thing that is seemingly impossible right now is to also somehow register our custom way of encoding those enums inside our decorator, so that we do not need to set it in the Config or pass the encoder argument to json explicitly. That may be possible with a few changes to the BaseModel logic, but I think this would be overkill.


    Original answer

    Validating Enum by name

    The parsing part of your problem can be solved fairly easily with a custom validator.

    Since a validator method can take the ModelField as an argument and that has the type_ attribute pointing to the type of the field, we can use that to try to coerce any value to a member of the corresponding Enum.

    We can actually write a more or less generalized implementation that applies to any arbitrary Enum subtype fields. If we use the "*" argument for the validator, it will apply to all fields, but we also need to set pre=True to perform our checks before the default validators kick in:

    from enum import Enum
    from typing import Any
    
    from pydantic import BaseModel, validator
    from pydantic.fields import ModelField
    
    
    class CustomBaseModel(BaseModel):
        @validator("*", pre=True)
        def coerce_to_enum_member(cls, v: Any, field: ModelField) -> Any:
            """For any `Enum` typed field, attempt to """
            type_ = field.type_
            if not (isinstance(type_, type) and issubclass(type_, Enum)):
                return v  # field is not an enum type
            if isinstance(v, type_):
                return v  # value is already an enum member
            try:
                return type_(v)  # get enum member by value
            except ValueError:
                try:
                    return type_[v]  # get enum member by name
                except KeyError:
                    raise ValueError(f"Invalid {type_.__name__} `{v}`")
    

    That validator is agnostic of the specific Enum subtype and it should work for all of them because it uses the common EnumType API, such as EnumType.__getitem__ to get the member by name.

    The nice thing about this approach is that while valid Enum names will be turned into the correct Enum members, passing a valid Enum value still works as it did before. As does passing the member directly.

    Enum names in the JSON Schema

    This is a bit more hacky, but not too bad.

    Pydantic actually allows us to easily customize schema generation for specific fields. This is done by adding the __modify_schema__ classmethod to the type in question.

    For Enum this turns out to be tricky, especially since you want to it to be created dynamically (via the Functional API). We cannot simply subclass Enum and add our modifier method there due to some magic around the EnumType. What we can do is simply monkey-patch it into Enum (or alternatively do that to our specific Enum subclasses).

    Either way, this method again gives us all we need to replace the default "enum" schema section with an array of names instead of values:

    from enum import Enum
    from typing import Any, Optional
    
    from pydantic.fields import ModelField
    
    
    def __modify_enum_schema__(
        field_schema: dict[str, Any],
        field: Optional[ModelField],
    ) -> None:
        if field is None:
            return
        enum_cls = field.type_
        assert isinstance(enum_cls, type) and issubclass(enum_cls, Enum)
        field_schema["enum"] = list(enum_cls.__members__.keys())
    
    
    # Monkey-patch `Enum` to customize schema modification:
    Enum.__modify_schema__ = __modify_enum_schema__  # type: ignore[attr-defined]
    

    And that is all we need. (Mypy will complain about the monkey-patching of course.)

    Full demo

    from enum import Enum
    from random import choices, random
    from string import ascii_lowercase
    from typing import Any, Optional
    
    from pydantic import BaseModel, validator
    from pydantic.fields import ModelField
    
    
    def __modify_enum_schema__(
        field_schema: dict[str, Any],
        field: Optional[ModelField],
    ) -> None:
        if field is None:
            return
        enum_cls = field.type_
        assert isinstance(enum_cls, type) and issubclass(enum_cls, Enum)
        field_schema["enum"] = list(enum_cls.__members__.keys())
    
    
    # Monkey-patch `Enum` to customize schema modification:
    Enum.__modify_schema__ = __modify_enum_schema__  # type: ignore[attr-defined]
    
    
    class CustomBaseModel(BaseModel):
        @validator("*", pre=True)
        def coerce_to_enum_member(cls, v: Any, field: ModelField) -> Any:
            """For any `Enum` typed field, attempt to """
            type_ = field.type_
            if not (isinstance(type_, type) and issubclass(type_, Enum)):
                return v  # field is not an enum type
            if isinstance(v, type_):
                return v  # value is already an enum member
            try:
                return type_(v)  # get enum member by value
            except ValueError:
                try:
                    return type_[v]  # get enum member by name
                except KeyError:
                    raise ValueError(f"Invalid {type_.__name__} `{v}`")
    
    
    # Randomly generate an enum of floats:
    _members = {
        name: round(random(), 1)
        for name in choices(ascii_lowercase, k=3)
    }
    Factor = Enum("Factor", _members)  # type: ignore[misc]
    first_member_name = next(iter(Factor)).name
    print("Random `Factor` members:", Factor.__members__)
    print("First member:", first_member_name)
    
    
    class Model(CustomBaseModel):
        factor: Factor
        foo: str
        bar: int
    
        class Config:
            json_encoders = {Factor: lambda field: field.name}
    
    
    obj = Model.parse_obj({
        "factor": first_member_name,
        "foo": "spam",
        "bar": -1,
    })
    print(obj.json(indent=4))
    print(Model.schema_json(indent=4))
    

    Output:

    Random `Factor` members: {'a': <Factor.a: 0.9>, 'q': <Factor.q: 0.6>, 'e': <Factor.e: 0.8>}
    First member: a
    
    {
        "factor": "a",
        "foo": "spam",
        "bar": -1
    }
    
    {
        "title": "Model",
        "type": "object",
        "properties": {
            "factor": {
                "$ref": "#/definitions/Factor"
            },
            "foo": {
                "title": "Foo",
                "type": "string"
            },
            "bar": {
                "title": "Bar",
                "type": "integer"
            }
        },
        "required": [
            "factor",
            "foo",
            "bar"
        ],
        "definitions": {
            "Factor": {
                "title": "Factor",
                "description": "An enumeration.",
                "enum": [
                    "a",
                    "q",
                    "e"
                ]
            }
        }
    }
    

    Notes

    I chose this super weird way of randomly generating an Enum just for illustrative purposes. I wanted to show that both validation and schema generation still work fine in that case. But in practice I would assume that the names actually don't change that drastically every time the program is run. (At least I hope they don't for the sake of your users.)

    The value of factor is still a regular Enum member, so obj.factor.value will still give us 0.9 (for this random example).

    The validator will obviously prevent invalid names/values to be passed. You can make it more specific, if you like or restrict it to only deal with str arguments assuming them to be Enum member names and delegate the rest to Pydantic's default validator. As it is written right now, it essentially replaces that default Enum validator.

    Any other schema modifications (such as the description) can be done according to the docs I linked as well.