Search code examples
jsonhadoopavro

Create Avro Schema from complex JSON containing map(key-value pair)


I have a JSON document and want to create an Avro schema for serialization of data as well as for deserialization.

I have written a Avro schema of a JSON document defined below but when I serialize the JSON data according to the schema, SchemaParser throws an exception. Though I read a lot about Avro and its datatype but unable to overcome the problem.

For the problem, I have specified JSON document, Avro schema and exception thrown by SchemaParser.

1) JSON Document

{
"category": "test",
"values": [
    {
        "subscriberid": 87392,
        "simserialnumber": 923,
        "MCC": 33,
        "MNC": [
            {
                "mn": {"key1":"kunal","key2":"gupta"},
                "mc": 44
            }
        ],
        "countryiso": "IN",
        "operatorname": "vodadone"
    }
]
}

2) Avro Schema

{
 "type": "record",
 "namespace": "testavro.schema",
 "name": "test",
 "fields": [
{
  "type": "string",
  "name": "data_version"
},
{
  "type": "string",
  "name": "ip_address"
},
{
  "type": "string",
  "name": "category"
},
{
  "type": {
    "items": {
      "fields": [
        {
          "type": "int",
          "name": "simserialnumber"
        },
        {
          "type": "string",
          "name": "countryiso"
        },
        {
          "type": "int",
          "name": "MCC"
        },
        {
          "type": "int",
          "name": "subscriberid"
        },
        {
          "type": {
            "items": {
              "fields": [
                {
                  "fields": [
                    {
                      "type": "string",
                      "name": "key2"
                    },
                    {
                      "type": "string",
                      "name": "key1"
                    }
                  ],
                  "type": "record",
                  "name": "mn"
                },
                {
                  "type": "int",
                  "name": "mc"
                }
              ],
              "type": "record",
              "name": "MNC_records"
            },
            "type": "array"
          },
          "name": "MNC"
        },
        {
          "type": "string",
          "name": "operatorname"
        }
      ],
      "type": "record",
      "name": "values_records"
    },
    "type": "array"
  },
  "name": "values"
}
]
}

3) SchemaParserException

SchemaParseException: Type property "{u'items': {u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}) not a valid Avro schema: Type property "{u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}) not a valid Avro schema: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record. (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records']) (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records'])

Please help me, It would be great for me to overcome the problem. I have spend one whole day on this JSON and Avro schema but unable to succeed.


Solution

  • Looks like there's a typo inside the items type of the MNC field in your values_record. Wrapping mn's definition inside a new object works:

    {
      "type": "record",
      "namespace": "testavro.schema",
      "name": "test",
      "fields": [
        {
          "type": "string",
          "name": "data_version"
        },
        {
          "type": "string",
          "name": "ip_address"
        },
        {
          "type": "string",
          "name": "category"
        },
        {
          "type": {
            "items": {
              "fields": [
                {
                  "type": "int",
                  "name": "simserialnumber"
                },
                {
                  "type": "string",
                  "name": "countryiso"
                },
                {
                  "type": "int",
                  "name": "MCC"
                },
                {
                  "type": "int",
                  "name": "subscriberid"
                },
                {
                  "type": {
                    "items": {
                      "fields": [
                        {
                          "type": {
                            "type": "record",
                            "fields": [
                              {
                                "type": "string",
                                "name": "key2"
                              },
                              {
                                "type": "string",
                                "name": "key1"
                              }
                            ],
                            "name": "Mn"
                          },
                          "name": "mn"
                        },
                        {
                          "type": "int",
                          "name": "mc"
                        }
                      ],
                      "type": "record",
                      "name": "MNC_records"
                    },
                    "type": "array"
                  },
                  "name": "MNC"
                },
                {
                  "type": "string",
                  "name": "operatorname"
                }
              ],
              "type": "record",
              "name": "values_records"
            },
            "type": "array"
          },
          "name": "values"
        }
      ]
    }