Search code examples
pythonmarshmallowpython-dataclasses

Python dataclasses circular parsing with marshmallow


I'm working with a JSON data structure and am trying to represent it as a dataclass. The data structure is (partly) circular and I want the nested data structures to be neatly represented as dataclasses as well.

I am having some trouble getting the dataclasses to parse correctly. See the simplified example below:

from typing import List, Optional, Union


class SchemaTypeName(Enum):
    LONG = "long"
    NULL = "null",
    RECORD = "record"
    STRING = "string"


@dataclass_json
@dataclass
class SchemaType():

    type: Union[
        SchemaTypeName,
        'SchemaType',
        List[
            Union[
                SchemaTypeName,
                'SchemaType'
            ]
        ]
    ]

    fields: Optional[List['SchemaType']] = None
    name: Optional[str] = None

Below is a printout of the object returned after calling from_dict with some sample data. Notice that the nested object (indicated with the arrow) is not parsed as a dataclass correctly.

SchemaType(
    type=[
        'null', 
------> {
            'fields': [
                {'name': 'id', 'type': 'string'}, 
                {'name': 'date', 'type': ['null', 'long']}, 
                {'name': 'name', 'type': ['null', 'string']}
            ],
            'type': 'record'
        }
    ]
)

Am I declaring the type hint for the type field incorrectly?

I'm using Python 3.9 with dataclasses_json==0.5.2 and marshmallow==3.11.1.


Solution

  • I found that the problem was related to dataclasses_json not decoding my elements correctly when they are in a list. Having mixed types in a list causes the decoder to return a list of basic strings and dicts, without transforming them to instances of SchemaType and SchemaTypeName.

    However, dataclasses_json allows you to configure a custom decoder function for any particular field. This is done by importing the config function from dataclasses_json and providing it as the metadata keyword argument for field. Next, include the decoder function as the decoder keyword argument for config.

    Please see the updated example below. Using the schemaTypeDecoder function, I am able to transform my data to the correct types.

    from dataclasses import field
    from dataclasses_json import config
    
    class SchemaTypeName(Enum):
        ARRAY = "array"
        LONG = "long"
        NULL = "null"
        OBJECT = "object"
        RECORD = "record"
        STRING = "string"
    
    
    def schemaTypeDecoder(data: Union[str, dict, List[Union[str, dict]]]):
    
        def transform(schemaType: Union[str, dict]):
            if isinstance(schemaType, str):
                return SchemaTypeName(schemaType)
            else:
                return SchemaType.from_dict(schemaType)
    
        if isinstance(data, list):
            return [transform(schemaType) for schemaType in data]
        else:
            return transform(data)
    
    
    @dataclass_json()
    @dataclass
    class SchemaType():
        type: Union[
            SchemaTypeName,
            'SchemaType',
            List[
                Union[
                    SchemaTypeName,
                    'SchemaType'
                ]
            ]
        ] = field(
            metadata=config(
                decoder=schemaTypeDecoder
            )
        )
    
        fields: Optional[List['SchemaType']] = None
        name: Optional[str] = None