Search code examples
apache-sparkpysparkavrospark-structured-streamingspark-streaming-kafka

How to merge multiple datatypes of an union type in avro schema to show one data type in the value field instead of member0 member1


I have the following avro schema

{
  "name": "MyClass",
  "type": "record",
  "namespace": "com.acme.avro",
  "fields": [
    {
      "name": "data",
      "type": 
        {
        "type": "map",
        "values": ["int","string"]
      }
    }
  ]
}

However, when I am streaming some events via kafka to spark with this schema, the streaming data frame depicts the data field as a struct with members with the datatypes specified in the schema as shown in the below image.

Schema format of the dataframe

Is there any possibility to merge the members to only show the value of the key rather than splitting and representing it as multiple members

so like number : -64

rather than number : { member0 : -64 , member1 : null}


Solution

  • There isn't any way to merge them and that is how avro works for a union datatype. Might have to live with it.