Search code examples
pythonapi-designpython-dataclasses

In Python dataclasses, why can an InitVar have default but not a default_factory?


In Python 3.7, I can create a dataclass with a defaulted InitVar just fine:

from dataclasses import dataclass, InitVar, field

@dataclass
class Foo:
    seed: InitVar[str] = field(default='tomato')
    stored: str = field(init=False)

    def __post_init__(self, seed: str):
        self.stored = f'planted {seed}'

print(Foo())

Now I try to create a similar dataclass with a mutable default, for which I need to use default_factory instead:

from dataclasses import dataclass, InitVar, field
from typing import List

@dataclass
class Bar:
    seeds: InitVar[List[str]] = field(default_factory=list)
    stored: List[str] = field(init=False)

    def __post_init__(self, seeds: List[str]):
        self.stored = [f'planted {seed}' for seed in seeds]

print(Bar())

However, this is not valid. Python raises TypeError: field seeds cannot have a default factory.

The dataclasses.py file from the standard library does not explain why:

    # Special restrictions for ClassVar and InitVar.
    if f._field_type in (_FIELD_CLASSVAR, _FIELD_INITVAR):
        if f.default_factory is not MISSING:
            raise TypeError(f'field {f.name} cannot have a '
                            'default factory')
        # Should I check for other field settings? default_factory
        # seems the most serious to check for.  Maybe add others.  For
        # example, how about init=False (or really,
        # init=<not-the-default-init-value>)?  It makes no sense for
        # ClassVar and InitVar to specify init=<anything>.

Why? What is the rationale behind this special restriction? How does this make sense?


Solution

  • The rationale is that supplying a default_factory would almost always be an error.

    The intent of InitVar is to create a pseudo-field, called an "init-only field". That is almost always populated by post_init() if the value is other than the default. It is never returned by module-level fields() function. The primary use case is initializing field values that depend on one or more other fields.

    Given this intent, it would almost always be a user error to supply a default_factory which is:

    1. Something we would want to see returned by the fields() function.

    2. Entirely unnecessary if we're using post_init() where you can call a factory directly.

    3. Not suited for the case where the object creation depends on other field values.