Search code examples
javaapache-kafkaavro

NullPointerException when attempting to serialize Avro GenericRecord containing array


I am trying to publish Avro (into Kafka) and get a NullPointerException when attempting to write the Avro object with the BinaryEncoder.

Here is the abbreviated stacktrace:

java.lang.NullPointerException: null of array of com.mycode.DeeplyNestedObject of array of com.mycode.NestedObject of union of com.mycode.ParentObject
    at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60) ~[avro-1.8.1.jar:1.8.1]
    at com.mycode.KafkaAvroPublisher.send(KafkaAvroPublisher.java:61) ~[classes/:na]
    ....
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:112) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:87) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105) ~[avro-1.8.1.jar:1.8.1]
    ... 55 common frames omitted

Here is the send method in my code where the exception occurs:

private static final EncoderFactory ENCODER_FACTORY = EncoderFactory.get();
private static final SpecificDatumWriter<ParentObject> PARENT_OBJECT_WRITER = new SpecificDatumWriter<>(ParentObject.SCHEMA$);
public void send(ParentObject parentObject) {
    try {
        ByteArrayOutputStream stream = new ByteArrayOutputStream();
        binaryEncoder = ENCODER_FACTORY.binaryEncoder(stream, binaryEncoder);
        PARENT_OBJECT_WRITER.write(parentObject, binaryEncoder);  // Exception HERE
        binaryEncoder.flush();
        producer.send(new ProducerRecord<>(topic, stream.toByteArray()));
    } catch (IOException ioe) {
        logger.debug("Problem publishing message to Kafka.", ioe);
    }
}

In the schema, the NestedObject contains an array of DeeplyNestedObject. I've done enough debugging to see that the NestedObject does, in fact, contain an array of DeeplyNestedObject or an empty array if none are present. Here is the relevant part of the schema:

[ { "namespace": "com.mycode.avro"
  , "type": "record"
  , "name": "NestedObject"
  , "fields":
    [ { "name": "timestamp", "type": "long", "doc": "Instant in time (milliseconds since epoch)." }
    , { "name": "objs", "type": { "type": "array", "items": "DeeplyNestedObject" }, "doc": "Elided." }
    ]
  }
]

Solution

  • The stacktrace coming out of Avro is misleading. The problem is likely one level deeper than the class the Exception message indicates.

    When it says "null of array of com.mycode.DeeplyNestedObject of array of com.mycode.NestedObject of union of com.mycode.ParentObject", it means that one of the fields inside the DeeplyNestedObject is expected to be an array but is found to be null. (It completely makes sense to misinterpret that as meaning that the DeeplyNestedObject is null inside of NestedObject.)

    You'll need to inspect the fields of DeeplyNestedObject and figure out which array is not being serialized correctly. The problem is likely to be located where the DeeplyNestedObject is created. It will have a field with type array which isn't being populated in all cases by the serializer before calling the send method.