Search code examples
avrogoogle-cloud-pubsub

GCP Pub/Sub message validation against nested record AVRO schema fails


I have a fairly complex AVRO schema which passes the validation, but the message that's sent to it somehow isn't compatible with the AVRO schema I defined.

Since then, I tried narrowing down the schema to understand why this could be happening, and it seems that nested record AVRO schema doesn't work too well in Pub/Sub.

For example, consider the following AVRO schema:

{
 "type" : "record",
 "name" : "Avro",
 "fields" : [
   {
     "name" : "foo",
     "type" : {
        "name": "foo_1",
        "type": "int"
     }
   }
 ]
}

and the sample message:

{
  "foo": {
    "foo_1": 42
  }
}

The sample schema above passes the schema validation in Pub/Sub, but when I test the message above, it fails.

I found similar post here, but it doesn't really help much.

Perhaps, Pub/Sub has issue with nested record?

Thanks a bunch.

UPDATE

Thanks to Kamal, the original message works. How would I handle UNION[NULL, nestedRecord] case? i.e.

{
    "type": "record",
    "name": "Avro",
    "fields": [
        {
            "name": "foo",
            "type": [
              "null",
                {
                  "type": "record",
                  "name": "NestedRecord",
                  "fields": [
                      {
                          "name": "foo_1",
                          "type": "int"
                      }
                  ]
              }
            ]
        }
    ]
}

Is this not a valid schema? My expectation is that this schema should handle both

{"foo": null}

and

{
  "foo": {
    "foo_1": 42
  }
}

but it only works for the null input, not with the nested input. Seems like a very basic question, but it's been very frustrating for me since Pub/Sub validation error isn't really useful..


Solution

  • The schema you have provided does not specify a nested record. Note that you have no additional "record" type inside the fields. The schema you want to match the message you have specified is:

    {
        "type": "record",
        "name": "Avro",
        "fields": [
            {
                "name": "foo",
                "type": {
                    "type": "record",
                    "name": "NestedRecord",
                    "fields": [
                        {
                            "name": "foo_1",
                            "type": "int"
                        }
                    ]
                }
            }
        ]
    }
    

    The schema you specified has a record with a single int field with the name foo. The nested "name" specification is essentially ignored. This is a message that is valid against the schema you provided:

    {
      "foo": 42
    }
    

    For the case of a union with null, you need to follow the JSON encoding rules for a union. In this case, the message would be:

    {
        "foo": {
            "NestedRecord": {
                "foo_1": 42
            }
        }
    }