I'm making a library where I want to take advantage of metadata on the fields of a dataclass.
To get my desired results, I can write the dataclass like the following:
@dataclass
class Foo:
a: int = field(
metadata={'my_metadata': {'my_required_key': "c"}}
)
b: dict[str, str] = field(
metadata={'my_metadata': {'my_required_key': "d"}}, default_factory=dict
)
This seems like a lot of boilerplate, especially if I want to make many classes with many fields like this. I was thinking I could write a factory function to wrap dataclass.field
and help reduce the amount of repetition.
However, I can't seem to get the type parameters right for calling into dataclass.field
and proper value for the response type is a mystery to me. What I have so far:
from dataclasses import dataclass, field, MISSING, _MISSING_TYPE
from typing import TypeVar, Union, Callable
_T = TypeVar("_T")
def myfield(
my_required_key: str,
*,
default: Union[_MISSING_TYPE, _T] = MISSING,
default_factory: Union[_MISSING_TYPE, Callable[[], _T]] = MISSING
) -> _T:
return field( # type: ignore
metadata={'my_metadata': {'my_required_key': my_required_key}},
default=default,
default_factory=default_factory,
)
@dataclass
class Foo:
a: int = myfield("c")
b: dict[str, str] = myfield("d", default_factory=dict)
This code will pass mypy
validation, but PyCharm doesn't seem to like it, reporting that:
Mutable default 'myfield("d", default_factory=dict)' is not allowed. Use 'default_factory'`
I'm okay with ignoring the PyCharm error, as the class does appear to function correctly, and I use mypy
in my CICD which seems to be cool with it.
As for the return type, I currently have myfield(...) -> _T
. I feel like the signature should look more like myfield(...) -> Field[_T]
, but mypy
rejects that idea, and reports:
error: Incompatible types in assignment (expression has type "Field[<nothing>]", variable has type "int")
error: Incompatible types in assignment (expression has type "Field[Dict[_KT, _VT]]", variable has type "Dict[str, str]")
I'm also not sure about how to type the default
and default_factory
parameters. Without the # type: ignore
I will get:
error: No overload variant of "field" matches argument types "Dict[str, Dict[str, str]]", "Union[_MISSING_TYPE, _T]", "Union[_MISSING_TYPE, Callable[[], _T]]"
note: Possible overload variants:
note: def [_T] field(*, default: _T, init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> _T
note: def [_T] field(*, default_factory: Callable[[], _T], init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> _T
note: def field(*, init: bool = ..., repr: bool = ..., hash: Optional[bool] = ..., compare: bool = ..., metadata: Optional[Mapping[str, Any]] = ...) -> Any
I see other libraries have gone about reducing the boiler-plate by making a factory method only return the metadata dictionary. i.e.
@dataclass
class Fizz:
a: int = field(metadata=myfield("c"))
b: dict[str, str] = field(metadata=myfield("d"), default_factory=dict)
This still feels a bit ugly to me, but maybe this is the way to go.
Any help or ideas for cleaning this up would be appreciated!
You are already "truncating" the full signature of fields
in your wrapper; I would just take that further:
def myfield(required_key: str, **kwargs):
kwargs['metadata'] = dict(my_metadata=dict(my_required_key=required_key))
return field(**kwargs)
This level of indirection, though, appears to prevent mypy
from checking that the arguments passed to myfield
have the correct types expected by field
.
Or, in the spirit of "Prefer composition over inheritance", just write a function that creates the correct metadata to use as an argument to field
. This lets your users call field
directly, saving you from having to duplicate its intricate hinting.
def make_metadata(required_key: str):
return dict(my_metadata(dict(my_required_key=required_key)))
@dataclass
class Foo:
a: int = field(metadata=make_metadata("c"))
b: dict[str, str] = field(metadata=make_metadata("d"), default_factory=dict)
Still some boilerplate, but less of it.
At some level, there is an inescapable trade-off involving the amount of static typing you can bolt onto a dynamically-typed language, or rather how that typing is bolted on. You'll see the use of overload
in the type hinting for field
, but overload
doesn't do anything. It's just a place to park annotations in the source code for mypy
to analyze; it doesn't change its target in anyway (in fact, it just discards it, as the intention is to redefine it later). That's why the hinted variants are just "implemented" with ...
, because the body doesn't matter: you'll never use the function
object being defined, only the final, undecorated function.
I would have suggested something like just setting the metadata directly on a Field
object, except the metadata is the only attribute that involves more than just a simple assignment in Field.__init__
:
def __init__(self, default, default_factory, init, repr, hash, compare,
metadata):
[...]
self.metadata = (_EMPTY_METADATA
if metadata is None else
types.MappingProxyType(metadata))
[...]
Backpedalling from "Prefer composition...", it would be nice if Field
were exposed directly, so that you could subclass it like
class MyField(Field):
def set_metadata(self, key: str):
self.metadata = types.MappingProxyType(dict(...))
return self
and use
@dataclass
class Foo:
a: int = Field().set_metadata("c")
b: dict[str, str] = Field(default_factory=dict).set_metadata("d")
Not that I'm advocating direct use of Field
like this, but....
Also, MappingProxyType
is only used, as far as I know, to make the metadata read-only. If you don't mind relaxing that part of the Field
object...
@dataclass
class Foo:
a: int = field()
make_metadata(a, "c")
b: dict[str, str] = field(default_factory=dict)
make_metadata(b, "d")