Search code examples
hadoopapache-pigavro

Apache pig ERROR org.apache.pig.backend.hadoop.executionengine.Launcher - Error: org.apache.avro.file.DataFileWriter$AppendWriteException:


I am trying to load some data, filter by certain field and store the output to HDFS. My code looks like:

data = LOAD '$inputPath' using AvroStorage();
data = FILTER data by condition;
STORE data INTO '$outputPath'using AvroStorage('schema', '$SCHEMA');

But I am getting an error saying:

 ERROR org.apache.pig.backend.hadoop.executionengine.Launcher - Error: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null of int of int in field id of com.stackoverflow.id

Can someone suggests what could be wrong? I am guessing it's because some of the fields read from HDFS are null and AvroStorage is not allowing this to happen? Thanks for any suggestions!


Solution

  • Your avro schema is defining a field that doesn't allow null but your data contains a null.