Search code examples
pythonpython-3.xserializationimmutabilitypython-dataclasses

What is the recommended way to include properties in dataclasses in asdict or serialization?


Note this is similar to How to get @property methods in asdict?.

I have a (frozen) nested data structure like the following. A few properties that are (purely) dependent on the fields are defined.

import copy
import dataclasses
import json
from dataclasses import dataclass

@dataclass(frozen=True)
class Bar:
    x: int
    y: int

    @property
    def z(self):
        return self.x + self.y

@dataclass(frozen=True)
class Foo:
    a: int
    b: Bar

    @property
    def c(self):
        return self.a + self.b.x - self.b.y

I can serialize the data structure as follows:

class CustomEncoder(json.JSONEncoder):
    def default(self, o):
        if dataclasses and dataclasses.is_dataclass(o):
            return dataclasses.asdict(o)
        return json.JSONEncoder.default(self, o)

foo = Foo(1, Bar(2,3))
print(json.dumps(foo, cls=CustomEncoder))

# Outputs {"a": 1, "b": {"x": 2, "y": 3}}

However, I would like to also serialize the properties (@property). Note I do not want to turn the properties into fields using __post_init__ as I would like to keep the dataclass' frozen. I do not want to use obj.__setattr__ to work around the frozen fields. I also do not want to pre-compute the values of the properties outside the class and pass them in as fields.

The current solution I am using is to explicitly write out how each object is serialized as follows:

class CustomEncoder2(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Foo):
            return {
                "a": o.a,
                "b": o.b,
                "c": o.c
            }
        elif isinstance(o, Bar):
            return {
                "x": o.x,
                "y": o.y,
                "z": o.z
            }
        return json.JSONEncoder.default(self, o)

foo = Foo(1, Bar(2,3))
print(json.dumps(foo, cls=CustomEncoder2))

# Outputs {"a": 1, "b": {"x": 2, "y": 3, "z": 5}, "c": 0} as desired

For a few levels of nesting, this is manageable but I am hoping for a more general solution. For example, here is a (hacky) solution that monkey-patches the _asdict_inner implementation from the dataclasses library.

def custom_asdict_inner(obj, dict_factory):
    if dataclasses._is_dataclass_instance(obj):
        result = []
        for f in dataclasses.fields(obj):
            value = custom_asdict_inner(getattr(obj, f.name), dict_factory)
            result.append((f.name, value))
        # Inject this one-line change
        result += [(prop, custom_asdict_inner(getattr(obj, prop), dict_factory)) for prop in dir(obj) if not prop.startswith('__')]
        return dict_factory(result)
    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
        return type(obj)(*[custom_asdict_inner(v, dict_factory) for v in obj])
    elif isinstance(obj, (list, tuple)):
        return type(obj)(custom_asdict_inner(v, dict_factory) for v in obj)
    elif isinstance(obj, dict):
        return type(obj)((custom_asdict_inner(k, dict_factory),
                          custom_asdict_inner(v, dict_factory))
                         for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

dataclasses._asdict_inner = custom_asdict_inner

class CustomEncoder3(json.JSONEncoder):
    def default(self, o):
        if dataclasses and dataclasses.is_dataclass(o):
            return dataclasses.asdict(o)
        return json.JSONEncoder.default(self, o)

foo = Foo(1, Bar(2,3))
print(json.dumps(foo, cls=CustomEncoder3))

# Outputs {"a": 1, "b": {"x": 2, "y": 3, "z": 5}, "c": 0} as desired

Is there a recommended way to achieve what I am trying to do?


Solution

  • There's no "recommended" way to include them that I know of.

    Here's something that seems to work and I think meets your numerous requirements. It defines a custom encoder that calls its own _asdict() method when the object is a dataclass instead of monkey-patching the (private) dataclasses._asdict_inner() function and encapsulates (bundles) the code within the customer encoder that makes use of it.

    Like you, I used the current implementation of dataclasses.asdict() as a guide/template since what you're asking for is basically just a customized version of that. The current value of each field that's a property is obtained by calling its __get__ method.

    import copy
    import dataclasses
    from dataclasses import dataclass, field
    import json
    import re
    from typing import List
    
    class MyCustomEncoder(json.JSONEncoder):
        is_special = re.compile(r'^__[^\d\W]\w*__\Z', re.UNICODE)  # Dunder name.
    
        def default(self, obj):
            return self._asdict(obj)
    
        def _asdict(self, obj, *, dict_factory=dict):
            if not dataclasses.is_dataclass(obj):
                raise TypeError("_asdict() should only be called on dataclass instances")
            return self._asdict_inner(obj, dict_factory)
    
        def _asdict_inner(self, obj, dict_factory):
            if dataclasses.is_dataclass(obj):
                result = []
                # Get values of its fields (recursively).
                for f in dataclasses.fields(obj):
                    value = self._asdict_inner(getattr(obj, f.name), dict_factory)
                    result.append((f.name, value))
                # Add values of non-special attributes which are properties.
                is_special = self.is_special.match  # Local var to speed access.
                for name, attr in vars(type(obj)).items():
                    if not is_special(name) and isinstance(attr, property):
                        result.append((name, attr.__get__(obj)))  # Get property's value.
                return dict_factory(result)
            elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
                return type(obj)(*[self._asdict_inner(v, dict_factory) for v in obj])
            elif isinstance(obj, (list, tuple)):
                return type(obj)(self._asdict_inner(v, dict_factory) for v in obj)
            elif isinstance(obj, dict):
                return type(obj)((self._asdict_inner(k, dict_factory),
                                  self._asdict_inner(v, dict_factory)) for k, v in obj.items())
            else:
                return copy.deepcopy(obj)
    
    
    if __name__ == '__main__':
    
        @dataclass(frozen=True)
        class Bar():
            x: int
            y: int
    
            @property
            def z(self):
                return self.x + self.y
    
    
        @dataclass(frozen=True)
        class Foo():
            a: int
            b: Bar
    
            @property
            def c(self):
                return self.a + self.b.x - self.b.y
    
            # Added for testing.
            d: List = field(default_factory=lambda: [42])  # Field with default value.
    
    
        foo = Foo(1, Bar(2,3))
        print(json.dumps(foo, cls=MyCustomEncoder))
    

    Output:

    {"a": 1, "b": {"x": 2, "y": 3, "z": 5}, "d": [42], "c": 0}