Search code examples
pythonpython-typingmypypython-dataclasses

Factory function for dataclass fields


I'm making a library where I want to take advantage of metadata on the fields of a dataclass.

To get my desired results, I can write the dataclass like the following:

@dataclass
class Foo:
    a: int = field(
        metadata={'my_metadata': {'my_required_key': "c"}}
    )
    b: dict[str, str] = field(
        metadata={'my_metadata': {'my_required_key': "d"}}, default_factory=dict
    )

This seems like a lot of boilerplate, especially if I want to make many classes with many fields like this. I was thinking I could write a factory function to wrap dataclass.field and help reduce the amount of repetition.

However, I can't seem to get the type parameters right for calling into dataclass.field and proper value for the response type is a mystery to me. What I have so far:

from dataclasses import dataclass, field, MISSING, _MISSING_TYPE
from typing import TypeVar, Union, Callable

_T = TypeVar("_T")


def myfield(
    my_required_key: str,
    *,
    default: Union[_MISSING_TYPE, _T] = MISSING,
    default_factory: Union[_MISSING_TYPE, Callable[[], _T]] = MISSING
) -> _T:
    return field(  # type: ignore
        metadata={'my_metadata': {'my_required_key': my_required_key}},
        default=default,
        default_factory=default_factory,
    )
@dataclass
class Foo:
    a: int = myfield("c")
    b: dict[str, str] = myfield("d", default_factory=dict)

This code will pass mypy validation, but PyCharm doesn't seem to like it, reporting that:

Mutable default 'myfield("d", default_factory=dict)' is not allowed. Use 'default_factory'`

I'm okay with ignoring the PyCharm error, as the class does appear to function correctly, and I use mypy in my CICD which seems to be cool with it.

As for the return type, I currently have myfield(...) -> _T. I feel like the signature should look more like myfield(...) -> Field[_T], but mypy rejects that idea, and reports:

error: Incompatible types in assignment (expression has type "Field[<nothing>]", variable has type "int")
error: Incompatible types in assignment (expression has type "Field[Dict[_KT, _VT]]", variable has type "Dict[str, str]")

I'm also not sure about how to type the default and default_factory parameters. Without the # type: ignore I will get:

error: No overload variant of "field" matches argument types "Dict[str, Dict[str, str]]", "Union[_MISSING_TYPE, _T]", "Union[_MISSING_TYPE, Callable[[], _T]]"
note: Possible overload variants:
note:     def [_T] field(*, default: _T, init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> _T
note:     def [_T] field(*, default_factory: Callable[[], _T], init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> _T
note:     def field(*, init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> Any

I see other libraries have gone about reducing the boiler-plate by making a factory method only return the metadata dictionary. i.e.

@dataclass
class Fizz:
    a: int = field(metadata=myfield("c"))
    b: dict[str, str] = field(metadata=myfield("d"), default_factory=dict)

This still feels a bit ugly to me, but maybe this is the way to go.

Any help or ideas for cleaning this up would be appreciated!


Solution

  • You are already "truncating" the full signature of fields in your wrapper; I would just take that further:

    def myfield(required_key: str, **kwargs):
        kwargs['metadata'] = dict(my_metadata=dict(my_required_key=required_key))
        return field(**kwargs)
    

    This level of indirection, though, appears to prevent mypy from checking that the arguments passed to myfield have the correct types expected by field.


    Or, in the spirit of "Prefer composition over inheritance", just write a function that creates the correct metadata to use as an argument to field. This lets your users call field directly, saving you from having to duplicate its intricate hinting.

    def make_metadata(required_key: str):
        return dict(my_metadata(dict(my_required_key=required_key)))
    
    
    @dataclass
    class Foo:
        a: int = field(metadata=make_metadata("c"))
        b: dict[str, str] = field(metadata=make_metadata("d"), default_factory=dict)
    

    Still some boilerplate, but less of it.

    At some level, there is an inescapable trade-off involving the amount of static typing you can bolt onto a dynamically-typed language, or rather how that typing is bolted on. You'll see the use of overload in the type hinting for field, but overload doesn't do anything. It's just a place to park annotations in the source code for mypy to analyze; it doesn't change its target in anyway (in fact, it just discards it, as the intention is to redefine it later). That's why the hinted variants are just "implemented" with ..., because the body doesn't matter: you'll never use the function object being defined, only the final, undecorated function.


    I would have suggested something like just setting the metadata directly on a Field object, except the metadata is the only attribute that involves more than just a simple assignment in Field.__init__:

    def __init__(self, default, default_factory, init, repr, hash, compare,
                 metadata):
        [...]
        self.metadata = (_EMPTY_METADATA
                         if metadata is None else
                         types.MappingProxyType(metadata))
        [...]
    

    Backpedalling from "Prefer composition...", it would be nice if Field were exposed directly, so that you could subclass it like

    class MyField(Field):
        def set_metadata(self, key: str):
            self.metadata = types.MappingProxyType(dict(...))
            return self
    

    and use

    @dataclass
    class Foo:
        a: int = Field().set_metadata("c")
        b: dict[str, str] = Field(default_factory=dict).set_metadata("d")
    

    Not that I'm advocating direct use of Field like this, but....

    Also, MappingProxyType is only used, as far as I know, to make the metadata read-only. If you don't mind relaxing that part of the Field object...

    @dataclass
    class Foo:
        a: int = field()
        make_metadata(a, "c")
        b: dict[str, str] = field(default_factory=dict)
        make_metadata(b, "d")