This night we have a disk space full issue and today I receive this error in my Flume log:
22 Feb 2017 10:24:56,180 ERROR [pool-6-thread-1] (org.apache.flume.client.avro.ReliableSpoolingFileEventReader.openFile:504) - Exception opening file: /.../flume_spool/data.../data_2017-02-21_17-15-00_8189
java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:160)
at org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:149)
at org.apache.flume.serialization.DurablePositionTracker.<init>(DurablePositionTracker.java:141)
at org.apache.flume.serialization.DurablePositionTracker.getInstance(DurablePositionTracker.java:76)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.openFile(ReliableSpoolingFileEventReader.java:478)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.getNextFile(ReliableSpoolingFileEventReader.java:459)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:229)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Flume version : 1.5.2
The java.io.IOException: Not a data file
exception was due to the presence of a temporary directory holding metadata for processing.
This directory is controlled by the trackerDir directive in the definition of the spooldir source in flume.conf (by default .flumespool in the spooldir).
We ended up having empty metadata files, which then did not have the 2 bytes that avro (we are using an avro sink) expected to see. There is actually nothing wrong at all with the actual data file, only with the metadatafile.
The solution is thus to delete .flumespool and the issue resolved itself (after releasing a bit of space from the disk, of course.)
/.../flume_spool/data...
find . -type f -empty
.flumespool/.flumespool-main.meta
rm .flumespool/.flumespool-main.meta