Search code examples
python-3.xapache-kafkaavrokafka-pythonfastavro

ValidationError while validating data against schema FastAvro


After multiple attempt I am not able to decode the error thrown by fastavro library when validating data against the schema.Below is what I am getting

 File "fastavro\\_validation.pyx", line 301, in fastavro._validation.validate
  File "fastavro\\_validation.pyx", line 311, in fastavro._validation.validate
  File "fastavro\\_validation.pyx", line 296, in fastavro._validation._validate
fastavro._validate_common.ValidationError: [
  " is <[{'id': '123 Drive Street', 'address': [{'address_line1': 'no'}]}]> of type <class 'list'> expected {'type': 'record', 'name': 'example.avro.Person', 'fields': [{'name': 'id', 'type': 'string'}, {'name': 'address', 'type': {'type': 'array', 'items': {'type': 'record', 'name': 'example.avro.Address', 'fields': [{'name': 'address_line1', 'type': 'string'}]}}}], '__fastavro_parsed': True, '__named_schemas': {'example.avro.Person': {'type': 'record', 'name': 'example.avro.Person', 'fields': [{'name': 'id', 'type': 'string'}, {'name': 'address', 'type': {'type': 'array', 'items': {'type': 'record', 'name': 'example.avro.Address', 'fields': [{'name': 'address_line1', 'type': 'string'}]}}}]}, 'example.avro.Address': {'type': 'record', 'name': 'example.avro.Address', 'fields': [{'name': 'address_line1', 'type': 'string'}]}}}"
]

This is part of a larger kafka project which I am trying to implement where I have a avro schema with nested structure. Below is the code

import fastavro
from fastavro import parse_schema

household_schema = {
  "namespace": "example.avro",
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "id",
      "type": "string"
    },

    {
      "name": "address",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Address",
          "fields": [
            {
              "name": "address_line1",
              "type": "string"
            }

          ]
        }
      }
    }
  ]
}

records = [
    {
        "id": "123 Drive Street",
        "address": [
            {
                "address_line1": "no"
            }
        ]
    }
]


parsed_schema = parse_schema(household_schema)
fastavro.validate(records, parsed_schema)

I generate a sample data based on the schema using a Pycharm plugin - AVRO Random Generator

{
  "id": "vrfofjyifppdyucdtx",
  "address": [
    {
      "address_line1": "no"
    }
  ]
}

This is what I am trying to do in the code too but no success. I looked at a post with similar issue but that also did not help Handling nested schemas of AVRO with Python3

I have also looked at fastavro doc where an example of nested structure and sample data has been shown and tried the same way but no luck

https://fastavro.readthedocs.io/en/latest/writer.html

I am struggling with this issue from past 2 days and have not been able to resolve this so Could someone please help me out in this


Solution

  • fastavro.validate expects just a single record to validate. So you just need to change the last line to fastavro.validate(records[0], parsed_schema).

    If you want to validate more than one record, you can do from fastavro.validation import validate_many and then your last line would be validate_many(records, parsed_schema).