Search code examples
pythonmypypython-typingpython-dataclasses

Create dataclass instance from union type based on string literal


I'm trying to strongly type our code base. A big part of the code is handling events that come from external devices and forwarding them to different handlers. These events all have a value attribute, but this value can have different types. This value type is mapped per event name. So a temperature event always has an int value, an register event always as RegisterInfo as its value.

So I would like to map the event name to the value type. But we are struggling with implementation.

This setup comes the closest to what we want:

@dataclass
class EventBase:
    name: str
    value: Any
    value_type: str

@dataclass
class RegisterEvent(EventBase):
    value: RegisterInfo
    name: Literal["register"]
    value_type: Literal["RegisterInfo"] = "RegisterInfo"


@dataclass
class NumberEvent(EventBase):
    value: float | int
    name: Literal["temperature", "line_number"]
    value_type: Literal["number"] = "number"

@dataclass
class StringEvent(EventBase):
    value: str
    name: Literal["warning", "status"]
    value_type: Literal["string"] = "string"


Events: TypeAlias = RegisterEvent | NumberEvent | StringEvent

With this setup mypy will flag incorrect code like:

def handle_event(event: Events):
    if event.name == "temperature":
        event.value.upper()

(It sees that a temperature event should have value type int, and that doesn't have an upper() method)

But creating the events becomes ugly this way. I don't want a big if statement that maps each event name to a specific event class. We have lots of different event types, and this mapping info is already inside these classes.

Ideally I would like it to look like this:

def handle_device_message(message_info):
    event_name = message_info["event_name"]
    event_value = message_info["event_value"]

    event = Events(event_name, event_value)

Is a "one-liner" like this possible?

I feel like we are kinda walking against wall here, could it be that the code is architecturally wrong?


Solution

  • UPDATE: Using Pydantic v2

    If you are willing to switch to Pydantic instead of dataclasses, you can define a discriminated union via typing.Annotated and use the TypeAdapter as a "universal" constructor that is able to discriminate between distinct Event subtypes based on the provided name string.

    Here is what I would suggest:

    from typing import Annotated, Any, Literal
    
    from pydantic import BaseModel, Field, TypeAdapter
    
    
    class EventBase(BaseModel):
        name: str
        value: Any
    
    
    class NumberEvent(EventBase):
        name: Literal["temperature", "line_number"]
        value: float
    
    
    class StringEvent(EventBase):
        name: Literal["warning", "status"]
        value: str
    
    
    Event = TypeAdapter(Annotated[
        NumberEvent | StringEvent,
        Field(discriminator="name"),
    ])
    
    
    event_temp = Event.validate_python({"name": "temperature", "value": 3.14})
    event_status = Event.validate_python({"name": "status", "value": "spam"})
    
    print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
    print(repr(event_status))  # StringEvent(name='status', value='spam')
    

    An invalid name would of course cause a validation error, just like a completely wrong and type for value (that cannot be coerced). Example:

    from pydantic import ValidationError
    
    try:
        Event.validate_python({"name": "temperature", "value": "foo"})
    except ValidationError as err:
        print(err.json(indent=4))
    
    try:
        Event.validate_python({"name": "foo", "value": "bar"})
    except ValidationError as err:
        print(err.json(indent=4))
    

    Output:

    [
        {
            "type": "float_parsing",
            "loc": [
                "temperature",
                "value"
            ],
            "msg": "Input should be a valid number, unable to parse string as a number",
            "input": "foo",
            "url": "https://errors.pydantic.dev/2.1/v/float_parsing"
        }
    ]
    
    [
        {
            "type": "union_tag_invalid",
            "loc": [],
            "msg": "Input tag 'foo' found using 'name' does not match any of the expected tags: 'temperature', 'line_number', 'warning', 'status'",
            "input": {
                "name": "foo",
                "value": "bar"
            },
            "ctx": {
                "discriminator": "'name'",
                "tag": "foo",
                "expected_tags": "'temperature', 'line_number', 'warning', 'status'"
            },
            "url": "https://errors.pydantic.dev/2.1/v/union_tag_invalid"
        }
    ]
    

    Original Answer: Using Pydantic v1

    If you are willing to switch to Pydantic instead of dataclasses, you can define a discriminated union via typing.Annotated and use the parse_obj_as function as a "universal" constructor that is able to discriminate between distinct Event subtypes based on the provided name string.

    Here is what I would suggest:

    from typing import Annotated, Any, Literal
    
    from pydantic import BaseModel, Field, parse_obj_as
    
    
    class EventBase(BaseModel):
        name: str
        value: Any
    
    
    class NumberEvent(EventBase):
        name: Literal["temperature", "line_number"]
        value: float
    
    
    class StringEvent(EventBase):
        name: Literal["warning", "status"]
        value: str
    
    
    Event = Annotated[
        NumberEvent | StringEvent,
        Field(discriminator="name"),
    ]
    
    
    event_temp = parse_obj_as(Event, {"name": "temperature", "value": "3.14"})
    event_status = parse_obj_as(Event, {"name": "status", "value": -10})
    
    print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
    print(repr(event_status))  # StringEvent(name='status', value='-10')
    

    In this usage demo I purposefully used the "wrong" types for the respective value fields to show that Pydantic will automatically try to coerce them to the right types, once it determines the correct model based on the provided name.

    An invalid name would of course cause a validation error, just like a completely wrong and type for value (that cannot be coerced). Example:

    from pydantic import ValidationError
    
    try:
        parse_obj_as(Event, {"name": "temperature", "value": "foo"})
    except ValidationError as err:
        print(err.json(indent=4))
    
    try:
        parse_obj_as(Event, {"name": "foo", "value": "bar"})
    except ValidationError as err:
        print(err.json(indent=4))
    

    Output:

    [
        {
            "loc": [
                "__root__",
                "NumberEvent",
                "value"
            ],
            "msg": "value is not a valid float",
            "type": "type_error.float"
        }
    ]
    
    [
        {
            "loc": [
                "__root__"
            ],
            "msg": "No match for discriminator 'name' and value 'foo' (allowed values: 'temperature', 'line_number', 'warning', 'status')",
            "type": "value_error.discriminated_union.invalid_discriminator",
            "ctx": {
                "discriminator_key": "name",
                "discriminator_value": "foo",
                "allowed_values": "'temperature', 'line_number', 'warning', 'status'"
            }
        }
    ]
    

    Side notes

    An alias for a union of types like NumberEvent | StringEvent should still have a singular name, i.e. Event rather than Events because semantically the annotation e: Event indicates e should be an instance of one of those types, whereas e: Events would suggest e will be multiple instances (a collection) of either of those types.

    Also the union float | int is almost always equivalent to float because int is by convention considered a subtype of float by all type checkers.