Search code examples
hadoophdfsflumeflume-ng

Flume + HDFS-200 append


The page https://cwiki.apache.org/confluence/display/FLUME/Getting+Started says that HDFS sink supports appending but I havent been able to find any information on how to enable it, every examples are on rolling files. So, I would appreciate any information on how to make flume append to an existing file, if it's at all possible)

Update

One can set all rolling properties to 0, that will make flume to write into single file, but it doesnt close the file and new records are not visible to other processes. There is a topic similar to mine: Flume NG and HDFS, Dmitry says that Flume doesnt support appending, but the answer is one year old and documentation says the opposite, so I thought maybe flume was improved or I'm misunderstanding something, any clues will be appreciated.

Update

I realized that the question is unclear so let me just describe what I'm trying to achieve: I want logs to be written in one file and I want to be able to read them as soon as they're ingested into hdfs. At the moment I'm using Cloudera Impala to perform search queries and it doesnt see new events even though Flume is configured to flush them on disk immediately, at least that's how I believe. My investigations show that usually people use HBase for these purposes but as far as I understand, it's not effective for ad hoc search queries unless one uses external indexing like Solr. The problem is that I need a solution ASAP so I was hoping that maybe it can be done easier, for example Fluentd can append to existing file, but it works only with plain text files and I would prefer some binary format.


Solution

  • I havent been able to find a way to make Flume do what I wanted, so eventually I've decided to use Cloudera Search for logs streaming, specifically Solr for both ingestion and retrieval. It seems that Flume doesnt have a capablity for real time ingestion in hdfs without creation of lots of relatively small files, hope they fix it in future.