Search code examples
pythonpydantic

How can I more efficiently deserialize set data strings?


I've written a Rust program that exposes it's JSON API in the form of JSON Schema and then using portions of that to create Python pydantic classes.

Where I'm stuck is that I have several JSON schema types that I could receive in the Python code and I'm rather inefficiently deserializing them. A facsimile of the what's below is working, but I feel there has to be a better, more 'pythonic' way.

class DataType(BaseModel):
    pass

class ATypeData(DataType):
    ...

class BTypeData(DataType):
    ...

class CTypeData(DataType):
    ...

def deserialize_wired_json_str(json_str) -> DataType:
    json_data = json.loads(json_str)
    
    if 'a_type' in json_str:
        return ATypeData.parse_raw(json.dumps(json_data['a_type']))
    elif 'b_type' in json_str:
        return BTypeData.parse_raw(json.dumps(json_data['b_type']))
    elif 'c_type' in json_str:
        return CTypeData.parse_raw(json.dumps(json_data['c_type']))
    ...

... and by "set" I mean the types of string encapsulated JSON data is known to map to the defined DataType class variants.


Solution

  • From a performance standpoint, you waste more time deserializing/serializing JSON than is necessary. Instead of

    ATypeData.parse_raw(json.dumps(json_data['a_type']))
    

    you can just do:

    ATypeData.model_validate(json_data['a_type'])
    

    Additionally, you can use a dictionary to simplify the cascading if-statements:

    data_type_mappings = {
      "a_type": ATypeData,
      "b_type": BTypeData,
      "c_type": CTypeData,
      # ...
    }
    
    def deserialize_wired_json_str(json_str: str) -> DataType:
      json_data = json.loads(json_str)
      data_type_name, data_value = next(iter(json_data.items()))
      data_type = data_type_mappings[data_type_name]
    
      return data_type.model_validate(data_value)