Search code examples
pythonpython-3.xvalidationschemacerberus

Python Cerberus embed numeric config data in schema


I have a set of documents and schemas I am doing validation against (shocker).

These documents are JSON messages from various different clients that use various different formats, thus a schema is defined for each document/message received from these clients.

I want to use a dispatcher (dictionary with function calls as values) to help perform the mapping/formatting of a document after it is validated against a matching schema. Once I know the schema a message is valid against, I can then create the desired message payload for my various consumer services by calling the requisite mapping function.

To this end I need a key in my dispatcher which uniquely maps to its respective mapping function for that schema. The key also needs to be used to identify a schema so the correct mapping function can be called.

My question is this: Is there a way to embed a config value like a numeric ID into a schema?

I want to take this schema:

schema = {
    "timestamp": {"type": "number"},
    "values": {
        "type": "list",
        "schema": {
            "type": "dict",
            "schema": {
                "id": {"required": True, "type": "string"},
                "v": {"required": True, "type": "number"},
                "q": {"type": "boolean"},
                "t": {"required": True, "type": "number"},
            },
        },
    },
}

And add a schema_id like this:

schema = {
    "schema_id": 1,
    "timestamp": {"type": "number"},
    "values": {
        "type": "list",
        "schema": {
            "type": "dict",
            "schema": {
                "id": {"required": True, "type": "string"},
                "v": {"required": True, "type": "number"},
                "q": {"type": "boolean"},
                "t": {"required": True, "type": "number"},
            },
        },
    },
}

So after successful validation, a link between message/document, to the schema via schema_id to the resulting mapping_function in the dispatcher is created.

Something like this:

mapping_dispatcher = {1: map_function_1, 2: map_function_2...}

if Validator.validate(document, schema) is True:
    id = schema["schema_id"]

formatted_message = mapping_dispatcher[id](document)

A last ditch effort could be to simply stringify the json schemas and use those as keys but I'm not sure how I feel about that (it feels clever but wrong)...

I could also be going about this all wrong and there's a smarter way to do it.

Thanks!

small update

I've hacked around it by stringifying the schema, converting to bytes, then hex, then adding the integer values together like so:

schema_id = 0
bytes_schema = str.encode(schema)
hex_schema = codecs.encode(bytes_schema, "hex") 
for char in hex_schema:
    schema_id += int(char)
>>>schema_id
36832

Solution

  • So instead of a hash function I just embedded the schema in another json object that held the info like so:

    [
        {
            "schema_id": "3",
            "schema": {
                "deviceName": {
                    "type": "string"
                },
                "tagName": {
                    "required": true,
                    "type": "string"
                },
                "deviceID": {
                    "type": "string"
                },
                "success": {
                    "type": "boolean"
                },
                "datatype": {
                    "type": "string"
                },
                "timestamp": {
                    "required": true,
                    "type": "number"
                },
                "value": {
                    "required": true,
                    "type": "number"
                },
                "registerId": {
                    "type": "string"
                },
                "description": {
                    "type": "string"
                }
            }
        }
    ]
    

    Was overthinking it I guess.