Search code examples
jsonavrofastavro

AVRO schema for JSON


I have a JSON which gets generated like this. I wanted to know what would the avro schema for this would be. The number of keys values in array list is not fixed. There are related posts but they have the keys referenced and do not change. In my case the keys change. The names of the variable keys keeps on changing.



"fixedKey": [
                {
                    "variableKey1": 2
                },
                {
                    "variableKey2": 1
                },
                {
                    "variableKey3": 3
                },
                .....
                {
                    "variableKeyN" : 10
                }
    
    
            ]

Solution

  • The schema should be something like this:

    {
        "type": "record",
        "name": "test",
        "fields": [
            {
                "name": "fixedKey",
                "type": {
                    "type": "array",
                    "items": [
                        {"type": "map", "values": "int"},
                    ],
                },
            }
        ],
    }
    

    Here's an example of serializing and deserializing your example data:

    from io import BytesIO
    from fastavro import writer, reader
    
    
    schema = {
        "type": "record",
        "name": "test",
        "fields": [
            {
                "name": "fixedKey",
                "type": {
                    "type": "array",
                    "items": [
                        {"type": "map", "values": "int"},
                    ],
                },
            }
        ],
    }
    
    records = [
        {
            "fixedKey": [
                {
                    "variableKey1": 1,
                },
                {
                    "variableKey2": 2,
                },
                {
                    "variableKey3": 3,
                },
            ]
        }
    ]
    
    bio = BytesIO()
    
    writer(bio, schema, records)
    bio.seek(0)
    for record in reader(bio):
        print(record)