Search code examples
pythonpython-3.xselfpython-dataclasses

Initialize dataclass instance with functions


I'm trying to create a dataclass to store all relevant data in a single object. How can I initialize a dataclass instance where the values are evaluated from functions within the dataclass, which take parameters?

This is where I am so far:

@dataclass
class Person: 
    def Name(self):
        return f'My name is {self.name[0]} {self.name[1]}.'

    def Age(self):
        return f'I am {self.age} years old.'

    name: field(default_factory=Name(self), init=True)
    age: field(default_factory=Age(self), init=True)

person = Person(('John', 'Smith'), '100')
print(person)

Current output:

Person(name=('John', 'Smith'), age='100')

This is the output I'm trying to achieve:

Person(name='My name is John Smith', age='I am 100 years old')

I was trying to use How to reference `self` in dataclass' fields? for reference on this topic.


Solution

  • First - and this is rather subtle - I note that it does not work to have dataclasses.field() as a type annotation. That is, name: field(...) is invalid. I can assume you mean to do name: str = field(...). Here str is the type annotation for name.

    But even with that approach, you would run into a TypeError based on how you are passing the default_factory argument - you would need a no-argument callable, though I notice that doesn't seem to help in this use case.

    My impression is, it is not possible to achieve what you are trying to do with dataclasses.field(...) alone, as I believe the docs indicate default_factory needs to be a zero argument callable.

    For instance, default_factory=list works as list() provides a no-arg constructor.

    However, note that the following is not possible:

    field(default_factory = lambda world: f'hello {world}!')
    

    dataclasses will not pass a value for world to the default_factory function, so you will run into an error with such an approach.

    The good news is there are a few different alternatives or options to consider in your case, which I proceed to outline below.

    Init-only Variables

    To work around this, one option could be to use a combination of InitVar with field(init=False):

    from dataclasses import field, dataclass, InitVar
    
    
    @dataclass
    class Person:
    
        in_name: InitVar[tuple[str, str]]
        in_age: InitVar[str]
    
        name: str = field(init=False)
        age: str = field(init=False)
    
        def __post_init__(self, in_name: tuple[str, str], in_age: str):
            self.name = f'My name is {in_name[0]} {in_name[1]}.'
            self.age = f'I am {in_age} years old.'
    
    
    person = Person(('John', 'Smith'), '100')
    print(person)
    

    Prints:

    Person(name='My name is John Smith.', age='I am 100 years old.')
    

    Properties

    Another usage could be with field-properties in dataclasses. In this case, the values are passed in to the constructor method as indicated (i.e. a tuple and str), and the @setter method for each field-property generates a formatted string, which it stores in a private attribute, for example as self._name.

    Note that there is undefined behavior when no default values for field properties are passed in the constructor, due to how dataclasses handles (or rather silently ignores) properties currently.

    To work around that, you can use a metaclass such as one I have outlined in this gist.

    from dataclasses import field, dataclass
    
    
    @dataclass
    class Person:
    
        name: tuple[str, str]
        age: str
    
        # added to silence any IDE warnings
        _age: str = field(init=False, repr=False)
        _name: str = field(init=False, repr=False)
    
        @property
        def name(self):
            return self._name
    
        @name.setter
        def name(self, name: tuple[str, str]):
            self._name = f'My name is {name[0]} {name[1]}.'
    
        @property
        def age(self):
            return self._age
    
        @age.setter
        def age(self, age: str):
            self._age = f'I am {age} years old.'
    
    
    person = Person(('John', 'Smith'), '100')
    print(person)
    
    person.name = ('Betty', 'Johnson')
    person.age = 150
    print(person)
    
    # note that a strange error is returned when no default value is passed for
    # properties; you can use my gist to work around that.
    # person = Person()
    

    Prints:

    Person(name='My name is John Smith.', age='I am 100 years old.')
    Person(name='My name is Betty Johnson.', age='I am 150 years old.')
    

    Descriptors

    One last option I would be remiss to not mention, and one I would likely recommend as being a little bit easier to set up than properties, would be the use of descriptors in Python.

    From what I understand, descriptors are essentially an easier approach as compared to declaring a ton of properties, especially if the purpose or usage of said properties is going to be quite similar.

    Here is an example of a custom descriptor class, named FormatValue:

    from typing import Callable, Any
    
    
    class FormatValue:
        __slots__ = ('fmt', 'private_name', )
    
        def __init__(self, fmt: Callable[[Any], str]):
            self.fmt = fmt
    
        def __set_name__(self, owner, name):
            self.private_name = '_' + name
    
        def __get__(self, obj, objtype=None):
            value = getattr(obj, self.private_name)
            return value
    
        def __set__(self, obj, value):
            setattr(obj, self.private_name, self.fmt(value))
    

    It can be used as follows, and works the same as the above example with properties:

    from dataclasses import dataclass
    
    
    @dataclass
    class Person:
        name: 'tuple[str, str] | str' = FormatValue(lambda name: f'My name is {name[0]} {name[1]}.')
        age: 'str | int' = FormatValue(lambda age: f'I am {age} years old.')
    
    
    person = Person(('John', 'Smith'), '100')
    print(person)
    
    person.name = ('Betty', 'Johnson')
    person.age = 150
    print(person)
    

    Prints:

    Person(name='My name is John Smith.', age='I am 100 years old.')
    Person(name='My name is Betty Johnson.', age='I am 150 years old.')