Search code examples
jsonserializationmapreduceavro

How to define avro schema for complex json document?


I have a JSON document that I would like to convert to Avro and need a schema to be specified for that purpose. Here is the JSON document for which I would like to define the avro schema:

{
 "uid": 29153333,
 "somefield": "somevalue",
 "options": [
   {
     "item1_lvl2": "a",
     "item2_lvl2": [
       {
         "item1_lvl3": "x1",
         "item2_lvl3": "y1"
       },
       {
         "item1_lvl3": "x2",
         "item2_lvl3": "y2"
       }
     ]
   }
 ]
}

I'm able to define the schema for the non-complex types but not for the complex "options" field:

{
  "namespace" : "my.com.ns",
  "type" :  "record",
  "fields" : [
     {"name": "uid", "type": "int"},
     {"name": "somefield", "type": "string"}
     {"name": "options", "type": .....}
  ]
}

Thanks for the help!


Solution

  • You need to use Avro complex types, specifically arrays and records. And then nest these together:

    {
      "namespace" : "my.com.ns",
      "name": "myrecord",
      "type" :  "record",
      "fields" : [
         {"name": "uid", "type": "int"},
         {"name": "somefield", "type": "string"},
         {"name": "options", "type": {
            "type": "array",
            "items": {
                "type": "record",
                "name": "lvl2_record",
                "fields": [
                    {"name": "item1_lvl2", "type": "string"},
                    {"name": "item2_lvl2", "type": {
                        "type": "array",
                        "items": {
                            "type": "record",
                            "name": "lvl3_record",
                            "fields": [
                                {"name": "item1_lvl3", "type": "string"},
                                {"name": "item2_lvl3", "type": "string"}
                            ]
                        }
                    }}
                ]
            }
         }}
      ]
    }
    

    Also, to improve readiblity, you can split the schema into multiple files.