Search code examples
hadoopflumeflume-ng

Flume - Would a source accept events even when the sink is non-operational?


New to flume.

Let's say I have an agent, which has a single avero-source, a single hdfs-sink and a single file-channel.

let's say at some point the sink fails to write to hdfs. Will the source continue to accept events, until the channel fills up?

Or would the source stop accepting events even-though the file-channel is not full?


Solution

  • I have tested this pretty extensively. You will have a hard time with this situation. When the sink fails, Flume will start throwing exceptions. Depending on the velocity of your stream, the channel will fill up as well causing more exceptions. The best thing to do to control for failure is to use a failover sink processor and configure a sink group. This way if one sink fails, you'll have a backup sink set up with very minimal data loss. In my experience, I have set up an Avro sink that goes to a second Flume agent hop in my topology and if that Flume agent goes down, then my failover sinks are 2 different Hadoop clusters and I write the Flume events to one of the Hadoop clusters via the HDFS sink. You then have to backfill these events. I have found the netcat source to be effective for this.