Search code examples
jsonpython-3.x

How to unpack nested JSON into Python Dataclass


Dataclass example:

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int 
    statuses: List[StatusElement]

JSON example:

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

I can unpack the JSON doing something like this:

object = List(**json)

But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.


Solution

  • Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.

    A few workarounds exist for this:

    • You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
    • You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.

    Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)

    Example below:

    from dataclasses import dataclass
    from typing import List as PyList
    
    from dataclass_wizard import JSONWizard
    
    
    @dataclass
    class List(JSONWizard):
        id: int
        statuses: PyList['StatusElement']
        # on Python 3.9+ you can use the following syntax:
        #   statuses: list['StatusElement']
    
    
    @dataclass
    class StatusElement:
        status: str
        order_index: int
        color: str
        type: str
    
    
    json = {
      "id": "124",
      "statuses": [
        {
          "status": "to do",
          "orderIndex": 0,
          "color": "#d3d3d3",
          "type": "open"
        }]
    }
    
    
    object = List.from_dict(json)
    
    print(repr(object))
    # List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])
    

    Disclaimer: I am the creator (and maintainer) of this library.


    You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.

    Here's the modified version of the above without class inheritance:

    from dataclasses import dataclass
    from typing import List as PyList
    
    from dataclass_wizard import fromdict, asdict
    
    
    @dataclass
    class List:
        id: int
        statuses: PyList['StatusElement']
    
    
    @dataclass
    class StatusElement:
        status: str
        order_index: int
        color: str
        type: str
    
    
    json = {
      "id": "124",
      "statuses": [
        {
          "status": "to do",
          "orderIndex": 0,
          "color": "#d3d3d3",
          "type": "open"
        }]
    }
    
    # De-serialize the JSON dictionary into a `List` instance.
    c = fromdict(List, json)
    
    print(c)
    # List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])
    
    # Convert the instance back to a dictionary object that is JSON-serializable.
    d = asdict(c)
    
    print(d)
    # {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}
    

    Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.

    from dataclasses import dataclass
    from timeit import timeit
    from typing import List
    
    from dacite import from_dict
    
    from dataclass_wizard import JSONWizard, fromdict
    
    
    data = {
        "id": 124,
        "statuses": [
            {
                "status": "to do",
                "orderindex": 0,
                "color": "#d3d3d3",
                "type": "open"
            }]
    }
    
    
    @dataclass
    class StatusElement:
        status: str
        orderindex: int
        color: str
        type: str
    
    
    @dataclass
    class List:
        id: int
        statuses: List[StatusElement]
    
    
    class ListWiz(List, JSONWizard):
        ...
    
    
    n = 100_000
    
    # 0.37
    print('dataclass-wizard:            ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))
    
    # 0.36
    print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))
    
    # 11.2
    print('dacite:                      ', timeit('from_dict(List, data)', number=n, globals=globals()))
    
    
    lst_wiz1 = ListWiz.from_dict(data)
    lst_wiz2 = from_dict(List, data)
    lst = from_dict(List, data)
    
    # True
    assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__