Search code examples
python-3.xpydanticpydantic-v2

how to prevent repeated computation of computed fields that depend on each other?


I am used to normal python classes, and I am trying to learn pydantic now. It's been a lot harder than I expected. What I often do is initiate a class with some initial input and based on that initial input I "calculate" a lot of attributes for that class. I can't get the creation of "calculated" attributes figured out in pydantic.

I created the following example to demonstrate the issue:

from pydantic import BaseModel, computed_field
from typing import List
class Person(BaseModel):
    first_name: str
    last_name: str

    @computed_field
    @property
    def composite_name(self) -> str:
        print("initializing composite_name")
        return f"{self.first_name} {self.last_name}"

    @computed_field
    @property
    def composite_name_list(self) -> List[str]:
        print("initializing name_list")
        return [f"{self.composite_name} {i}" for i in range(5)]

p = Person(first_name="John", last_name="Doe")
print(p.composite_name_list)

In the code above I would expect this code to run composite_name and create the composite_name attribute. Then I would expect it to run composite_name_list and create the composite_name_list attribute. It would thus go through each of this functions exactly once, and it would print once "intializing composite_name" and then "intializing name_list".

Instead, the print-out I get is:

initializing name_list
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']

A couple of odd things in this printout:

  1. The first thing printed is "intializing name_list" while the print statement of "initializing composite_name" comes first.
  2. It seems to recalculate the composite name attribute every time it is called, even though I used the computed field decorator.
  3. I added the last line "print(p.composite_name_list) because otherwise it wouldn't print anything at all! Or in other words, instantiating the class Person does not automatically seem to cause the creation of my two computed properties.

In standard python, I would have just created this class like this:

class PersonStandardPython:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
        self.composite_name = f"{first_name} {last_name}"
        self.composite_name_list = [f"{self.composite_name} {i}" for i in range(5)]

How can I get to a similar result as my standard python implementation while still having the benefit of pydantics strong typing?


Solution

  • Intro

    I think there are some misunderstandings on how the computed_field works and how it is meant to be used. computed_field acts very much like a property in Python, that's why it also uses the property decorator in addition. It mimics the appearance of an attribute, while computing its value "on request" only (see Python docs). The computed_field decorator then only add this property to the list of valid field to the Pydantic model and thus it can be used for e.g. serialization.

    Mutability

    In general computed fields / properties can be used to re-compute another value based of mutable attributes. In case of your first example one could modify first_name or last_name and composite_name would still return the correct name, for example:

    p = Person(first_name="John", last_name="Doe")
    print(p.composite_name)
    
    p.first_name = "Jane"
    print(p.composite_name)
    

    Which should print :

    John Doe
    Jane Doe
    

    In contrast, in your second example, if you modified first_name, composite_name would still be set to the value it has been assigned on init, like so:

    p = PersonStandardPython(first_name="John", last_name="Doe")
    print(p.composite_name)
    
    p.first_name = "Jane"
    print(p.composite_name)
    

    Which should print:

    John Doe
    John Doe
    

    So both case exhibit totally different behaviors with regards to mutability. If you want your Person object to be mutable, your first example is entirely correct! You just have to look at it again and understand its behavior. So let me address the three points you mentioned:

    The first thing printed is "intializing name_list" while the print statement of "initializing composite_name" comes first.

    This is entirely expected. As the compute_field works like a property, it executes the code defined in composite_name_list first, before the other property composite_name is accessed.

    It seems to recalculate the composite name attribute every time it is called, even though I used the computed field decorator.

    Again this is entirely expected. As it works just like a property it re-executes the code defined in the method. However you can cache the result of the computed property (more on this later).

    I added the last line "print(p.composite_name_list) because otherwise it wouldn't print anything at all! Or in other words, instantiating the class Person does not automatically seem to cause the creation of my two computed properties.

    This also expected, because the code is only executed on access of the computed field. It is "delayed" and not computed on initialization of the object.

    Faux Immutability

    Alternatively with Pydantic you can achieve "faux immutability" (see faux immutability docs). This way you can compute the derived attributes on init or before and prevent that the attributes it is based off are modified later. For this you can use frozen=True in the class definition and for example a model_validator:

    from pydantic import BaseModel, model_validator
    from typing import List, Optional
    
    
    class Person(BaseModel, frozen=True):
        first_name: str
        last_name: str
        composite_name: Optional[str] = None
        composite_name_list: Optional[List[str]] = None
    
        @model_validator(mode="before")
        @classmethod
        def init_derived_attribute(cls, data, info):
            first_name = data.get("first_name")
            last_name = data.get("last_name")
            composite_name = f"{first_name} {last_name}"
    
            data["composite_name"] = composite_name
            data["composite_name_list"] = [f"{composite_name} {i}" for i in range(5)]
            return data
    
    p = Person(first_name="John", last_name="Doe")
    print(p.composite_name)
    p.first_name = "Jane" # this now raises an error!
    

    Proposed Solution

    While the example above works fine, I think it is not the cleanest solution. You mention you would mostly like to avoid the re-computation of the field. The solution for this is simple. You can use a cached_property from the standard functools library. However in this case you should still combine it with faux immutability to make sure the object cannot be modified in memory and the derived property goes out of sync. Here is the final code I would propose:

    from pydantic import BaseModel, computed_field
    from typing import List
    from functools import cached_property
    
    class Person(BaseModel, frozen=True):
        first_name: str
        last_name: str
    
        @computed_field
        @cached_property
        def composite_name(self) -> str:
            print("initializing composite_name")
            return f"{self.first_name} {self.last_name}"
    
        @computed_field
        @cached_property
        def composite_name_list(self) -> List[str]:
            print("initializing name_list")
            return [f"{self.composite_name} {i}" for i in range(5)]
    
    
    p = Person(first_name="John", last_name="Doe")
    print(p.composite_name_list)
    

    Which prints:

    initializing name_list
    initializing composite_name
    ['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']
    

    While it keeps the execution order the same (see above, this is expected), it avoids the re-computation of composite_name and only prints it once. For a all subsequent access it is cached. One important note here is that typically it is only reasonable to use cache_property if the computation is rather "expensive". If you really just concatenate two string, doing it repeatedly my just be fine.

    Summary

    If you intend your Person class to be mutable your first proposed solution is just fine! The re-computation ensures, that the derived fields / properties are always "up to date" with the other attributes it is derived of. Alternatively you can change to one of the solutions with "faux immutability" I proposed above, which both avoid the re-computation.