Search code examples
apache-kafkadb2apache-kafka-connectavro

Creating a Kafka sink connector to a table with special characters in column names


I am creating a Kafka sink connector for a very old database, the schema for which I cannot alter.

Unfortunately this database has a few columns with special characters which do not work with Avro out-of-the-box. Otherwise, the schema would look something like this:

{
    "type": "record",
    "name": "db2_topic",
    "fields": [
        {"name": "PERSON", "type": "string"},
        {"name": "DATE", "type": "type": "long"},
        {"name": "PRICE", "type": "string"}
        {"name": "PRICE$", "type": "string"}
    ]
}

According to the Avro specs, "name" must match alphanumeric characters according to the regex [A-Za-z_][A-Za-z0-9_]*. The build fails because the name "PRICE$" contains a character outside of the regex.

Does anyone have experience working their way around this using Avro, or is there another compatible serializer that can be used for this?


Solution

  • Data isn't immediately serialized to Avro (plus, Avro is not the only format that Kafka Connect can use). It first passes into a Kafka Connect Struct object, which can be (minimally) transformed such as ReplaceField$Value operator, and I'd recommend PRICE_USD, perhaps.