I have a pig script that reads data from a directory on HDFS. The data are stored as avro files. The file structure looks like:
In the pig script I am simply doing a load, filter and store. It looks like:
items = LOAD path USING AvroStorage()
items = FILTER items BY some property
STORE items into outputDirectory using AvroStorage()
The problem right now is that pig is outputting many empty files in the output directory. I am wondering if there's a way to remove those files? Thanks!
For pig version 0.13 and later, you can set pig.output.lazy=true to avoid creating empty files. (https://issues.apache.org/jira/browse/PIG-3299)