Search code examples
hadoopflumeflume-ng

Where to run the flume agent that writes to HDFS?


I have 25-20 agents sending the data to couple of collector agents and and these collector agents then have to write it to the HDFS.

Where to run these collector agents? On the Data node of the Hadoop cluster or outside of the cluster? What are the pros/cons of each and how are people currently running them?


Solution

  • tier 2 flume agents use hdfsSink write directly to HDFS. what's more , Tier1 can use failover sinkgroup. In case of one of tier 2 flume agent is down.