Search code examples
publish-subscribegoogle-cloud-pubsub

PubSub Subscription error with REPEATED Column Type - Avro Schema


I am trying to use the PubSub Subscription "Write to BigQuery" but am running into an issue with the "REPEATED" column type. the message I get when update the subscription is

Incompatible schema mode for field 'Values': field is REQUIRED in the topic schema, but REPEATED in the BigQuery table schema

My Avro Schema is:

    {
      "type": "record",
      "name": "Avro",
      "fields": [
        {
          "name": "ItemID",
          "type": "string"
        },
        {
          "name": "UserType",
          "type": "string"
        },
        {
          "name": "Values",
          "type": [
            {
              "type": "record",
              "name": "Values",
              "fields": [
                {
                  "name": "AttributeID",
                  "type": "string"
                },
                {
                  "name": "AttributeValue",
                  "type": "string"
                }
              ]
            }
          ]
        }
      ]
    }

Input JSON That "Matches" Schema:

{
  "ItemID": "Item_1234",
  "UserType": "Item",
  "Values": {
    "AttributeID": "TEST_ID_1", 
    "AttributeValue": "Value_1"
  }
}

my Table looks like:

ItemID | STRING | NULLABLE
UserType | STRING | NULLABLE
Values | RECORD | REPEATED
  AttributeID | STRING | NULLABLE
  AttributeValue | STRING | NULLABLE

I am able to "Test" and "Validate Schema" and it comes back with a success. Question is, what am I missing on the Avro for the Values node to make it "REPEATED" vs "Required" for subscription to be created.


Solution

  • Per Kamal's comment above, this schema works:

    {
      "type": "record",
      "name": "Avro",
      "fields": [
        {
          "name": "ItemID",
          "type": "string"
        },
        {
          "name": "UserType",
          "type": "string"
        },
        {
          "name": "Values",
          "type": {
            "type": "array",
            "items": {
              "name": "NameDetails",
              "type": "record",
              "fields": [
                {
                  "name": "ID",
                  "type": "string"
                },
                {
                  "name": "Value",
                  "type": "string"
                }
              ]
            }
          }
        }
      ]
    }
    

    the payload:

    {
      "ItemID": "Item_1234",
      "UserType": "Item",
      "Values": [
        { "AttributeID": "TEST_ID_1" },
        { "AttributeValue": "Value_1" }
      ]
    }