Search code examples
pythonjsonnestedwrapperpython-dataclasses

Nested python dataclasses with list annotations


python ^3.7. Trying to create nested dataclasses to work with complex json response. I managed to do that with creating dataclass for every level of json and using __post_init_ to set fields as objects of other dataclasses. However that creates a lot of boilerplate code and also, there is no annotation for nested objects.

This answer helped me getting closer to the solution using wrapper:

https://stackoverflow.com/a/51565863/8325015

However it does not solve it for cases where attribute is list of objects. some_attribute: List[SomeClass]

Here is example that resembles my data:

from dataclasses import dataclass, is_dataclass
from typing import List
from copy import deepcopy

# decorator from the linked thread:
def nested_deco(*args, **kwargs):
    def wrapper(check_class):

        # passing class to investigate
        check_class = dataclass(check_class, **kwargs)
        o_init = check_class.__init__

        def __init__(self, *args, **kwargs):

            for name, value in kwargs.items():

                # getting field type
                ft = check_class.__annotations__.get(name, None)

                if is_dataclass(ft) and isinstance(value, dict):
                    obj = ft(**value)
                    kwargs[name] = obj
                o_init(self, *args, **kwargs)

        check_class.__init__ = __init__

        return check_class

    return wrapper(args[0]) if args else wrapper


#some dummy dataclasses to resemble my data structure

@dataclass
class IterationData:
    question1: str
    question2: str


@nested_deco
@dataclass
class IterationResult:
    name: str
    data: IterationData


@nested_deco
@dataclass
class IterationResults:
    iterations: List[IterationResult]


@dataclass
class InstanceData:
    date: str
    owner: str


@nested_deco
@dataclass
class Instance:
    data: InstanceData
    name: str


@nested_deco
@dataclass
class Result:
    status: str
    iteration_results: IterationResults


@nested_deco
@dataclass
class MergedInstance:
    instance: Instance
    result: Result


#example data

single_instance = {
    "instance": {
        "name": "example1",
        "data": {
            "date": "2021-01-01",
            "owner": "Maciek"
        }
    },
    "result": {
        "status": "complete",
        "iteration_results": [
            {
                "name": "first",
                "data": {
                    "question1": "yes",
                    "question2": "no"
                }
            }
        ]
    }
}

instances = [deepcopy(single_instance) for i in range(3)] #created a list just to resemble mydata
objres = [MergedInstance(**inst) for inst in instances]

As you will notice. nested_deco works perfectly for attributes of MergedInstance and for attribute data of Instance but it does not load IterationResults class on iteration_results of Result.

Is there a way to achieve it?

I attach also example with my post_init solution which creates objects of classes but there is no annotation of attributes:

@dataclass
class IterationData:
    question1: str
    question2: str


@dataclass
class IterationResult:
    name: str
    data: dict

    def __post_init__(self):
        self.data = IterationData(**self.data)


@dataclass
class InstanceData:
    date: str
    owner: str


@dataclass
class Instance:
    data: dict
    name: str

    def __post_init__(self):
        self.data = InstanceData(**self.data)


@dataclass
class Result:
    status: str
    iteration_results: list

    def __post_init__(self):
        self.iteration_results = [IterationResult(**res) for res in self.iteration_results]


@dataclass
class MergedInstance:
    instance: dict
    result: dict

    def __post_init__(self):
        self.instance = Instance(**self.instance)
        self.result = Result(**self.result)

Solution

  • This doesn't really answer your question about the nested decorators, but my initial suggestion would be to avoid a lot of hard work for yourself by making use of libraries that have tackled this same problem before.

    There are lot of well known ones like pydantic which also provides data validation and is something I might recommend. If you are interested in keeping your existing dataclass structure and not wanting to inherit from anything, you can use libraries such as dataclass-wizard and dataclasses-json. The latter one offers a decorator approach which you might interest you. But ideally, the goal is to find a (efficient) JSON serialization library which already offers exactly what you need.

    Here is an example using the dataclass-wizard library with minimal changes needed (no need to inherit from a mixin class). Note that I had to modify your input JSON object slightly, as it didn't exactly match the dataclass schema otherwise. But otherwise, it looks like it should work as expected. I've also removed copy.deepcopy, as that's a bit slower and we don't need it (the helper functions won't directly modify the dict objects anyway, which is simple enough to test)

    from dataclasses import dataclass
    from typing import List
    
    from dataclass_wizard import fromlist
    
    
    @dataclass
    class IterationData:
        question1: str
        question2: str
    
    
    @dataclass
    class IterationResult:
        name: str
        data: IterationData
    
    
    @dataclass
    class IterationResults:
        iterations: List[IterationResult]
    
    
    @dataclass
    class InstanceData:
        date: str
        owner: str
    
    
    @dataclass
    class Instance:
        data: InstanceData
        name: str
    
    
    @dataclass
    class Result:
        status: str
        iteration_results: IterationResults
    
    
    @dataclass
    class MergedInstance:
        instance: Instance
        result: Result
    
    
    single_instance = {
        "instance": {
            "name": "example1",
            "data": {
                "date": "2021-01-01",
                "owner": "Maciek"
            }
        },
        "result": {
            "status": "complete",
            "iteration_results": {
                # Notice i've changed this here - previously syntax was invalid (this was
                # a list)
                "iterations": [
                    {
                        "name": "first",
                        "data": {
                            "question1": "yes",
                            "question2": "no"
                        }
                    }
                ]
            }
        }
    }
    
    instances = [single_instance for i in range(3)]  # created a list just to resemble mydata
    
    objres = fromlist(MergedInstance, instances)
    
    for obj in objres:
        print(obj)
    
    

    Using the dataclasses-json library:

    from dataclasses import dataclass
    from typing import List
    
    from dataclasses_json import dataclass_json
    
    
    # Same as above
    ...
    
    @dataclass_json
    @dataclass
    class MergedInstance:
        instance: Instance
        result: Result
    
    
    single_instance = {...}
    
    instances = [single_instance for i in range(3)]  # created a list just to resemble mydata
    
    objres = [MergedInstance.from_dict(inst) for inst in instances]
    
    for obj in objres:
        print(obj)
    

    Bonus: Let's say you are calling an API that returns you a complex JSON response, such as the one above. If you want to convert this JSON response to a dataclass schema, normally you'll have to write it out by hand, which can be a bit tiresome if the structure of the JSON is especially complex.

    Wouldn't it be cool if there was a way to simplify the generation of a nested dataclass structure? The dataclass-wizard library comes with a CLI tool that accepts an arbitrary JSON input, so it should certainly be doable to auto-generate a dataclass schema given such an input.

    Assume you have these contents in a testing.json file:

    {
        "instance": {
            "name": "example1",
            "data": {
                "date": "2021-01-01",
                "owner": "Maciek"
            }
        },
        "result": {
            "status": "complete",
            "iteration_results": {
                "iterations": [
                    {
                        "name": "first",
                        "data": {
                            "question1": "yes",
                            "question2": "no"
                        }
                    }
                ]
            }
        }
    }
    

    Then we run the following command:

    wiz gs testing testing
    

    And the contents of our new testing.py file:

    from dataclasses import dataclass
    from datetime import date
    from typing import List, Union
    
    from dataclass_wizard import JSONWizard
    
    
    @dataclass
    class Data(JSONWizard):
        """
        Data dataclass
    
        """
        instance: 'Instance'
        result: 'Result'
    
    
    @dataclass
    class Instance:
        """
        Instance dataclass
    
        """
        name: str
        data: 'Data'
    
    
    @dataclass
    class Data:
        """
        Data dataclass
    
        """
        date: date
        owner: str
    
    
    @dataclass
    class Result:
        """
        Result dataclass
    
        """
        status: str
        iteration_results: 'IterationResults'
    
    
    @dataclass
    class IterationResults:
        """
        IterationResults dataclass
    
        """
        iterations: List['Iteration']
    
    
    @dataclass
    class Iteration:
        """
        Iteration dataclass
    
        """
        name: str
        data: 'Data'
    
    
    @dataclass
    class Data:
        """
        Data dataclass
    
        """
        question1: Union[bool, str]
        question2: Union[bool, str]
    

    That appears to more or less match the same nested dataclass structure from the original question, and best of all we didn't need to write any of the code ourselves!

    However, there's a minor problem - because of some duplicate JSON keys, we end up with three data classes named Data. So I've went ahead and renamed them to Data1, Data2, and Data3 for uniqueness. And then we can do a quick test to confirm that we're able to load the same JSON data into our new dataclass schema:

    import json
    from dataclasses import dataclass
    from datetime import date
    from typing import List, Union
    
    from dataclass_wizard import JSONWizard
    
    
    @dataclass
    class Data1(JSONWizard):
        """
        Data dataclass
    
        """
        instance: 'Instance'
        result: 'Result'
    
    
    @dataclass
    class Instance:
        """
        Instance dataclass
    
        """
        name: str
        data: 'Data2'
    
    
    @dataclass
    class Data2:
        """
        Data dataclass
    
        """
        date: date
        owner: str
    
    
    @dataclass
    class Result:
        """
        Result dataclass
    
        """
        status: str
        iteration_results: 'IterationResults'
    
    
    @dataclass
    class IterationResults:
        """
        IterationResults dataclass
    
        """
        iterations: List['Iteration']
    
    
    @dataclass
    class Iteration:
        """
        Iteration dataclass
    
        """
        name: str
        data: 'Data3'
    
    
    @dataclass
    class Data3:
        """
        Data dataclass
    
        """
        question1: Union[bool, str]
        question2: Union[bool, str]
    
    
    # ---- Start of our test
    
    with open('testing.json') as in_file:
        d = json.load(in_file)
    
    c = Data1.from_dict(d)
    
    print(repr(c))
    # Data1(instance=Instance(name='example1', data=Data2(date=datetime.date(2021, 1, 1), owner='Maciek')), result=Result(status='complete', iteration_results=IterationResults(iterations=[Iteration(name='first', data=Data3(question1='yes', question2='no'))])))