agent.sinks=hpd
agent.sinks.hpd.type=hdfs
agent.sinks.hpd.channel=memoryChannel
agent.sinks.hpd.hdfs.path=hdfs://master:9000/user/hduser/gde
agent.sinks.hpd.hdfs.fileType=DataStream
agent.sinks.hpd.hdfs.writeFormat=Text
agent.sinks.hpd.hdfs.rollSize=0
agent.sinks.hpd.hdfs.batchSize=1000
agent.sinks.hpd.hdfs.fileSuffix=.i
agent.sinks.hpd.hdfs.rollCount=1000
agent.sinks.hpd.hdfs.rollInterval=0
I'm trying to use HDFS Sink to write events to HDFS. And have tried Size, Count and Time bases rolling but none is working as expected. It is generating too many small files in HDFS like:
-rw-r--r-- 2 hduser supergroup 11617 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832879.i
-rw-r--r-- 2 hduser supergroup 1381 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832880.i
-rw-r--r-- 2 hduser supergroup 553 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832881.i
-rw-r--r-- 2 hduser supergroup 2212 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832882.i
-rw-r--r-- 2 hduser supergroup 1379 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832883.i
-rw-r--r-- 2 hduser supergroup 2762 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832884.i.tmp
Please assist to resolve the given problem. I'm using flume 1.6.0
~Thanks
My provided configurations were all correct. The reason behind such behavior was HDFS. I had 2 data nodes out of which one was down. So, files were not achieving minimum required replication. In Flume logs one can see below warning message too:
"Block Under-replication detected. Rotating file."
To remove this problem one can opt for any of below solution:-
hdfs.minBlockReplicas
accordingly.~Thanks