I'm trying to test aggregating logs using Flume.
Installed CDH3u3 (name node, secondary name node, job tracker, data node, task tracker) and Flume (flume, flume-master, and flume-node) on ubuntu machine host1
. For Flume installtion I followed https://ccp.cloudera.com/display/CDHDOC/Flume+Installation .
I want to run flume master, collector node, and agent node on a same node. When I use plain flume
command, I can run all three services successfully: ( reference: http://ankitasblogger.blogspot.com/2011/05/installing-flume-in-cluster-complete.html )
$ flume master
$ flume node -n flume-collector
$ flume node -n flume-agent
However, it is impossible to run two nodes using flume-node
:
$ sudo /etc/init.d/flume-master start
$ sudo /etc/init.d/flume-node start
I can't give a node name to flume-node
command, and it just creates a node named host1.host.com
.
Should I use flume
instead of flume-master
and flume-node
if I want multiple nodes on a same machine? I think flume-master
and flume-node
give more convenience because it informs the log path, while flume
logs to stdout.
You don't have to install two nodes on same machine. You can configure the logic nodes on flume-master, they can work on different logic nodes, these nodes just on different ports.
The difference between collector and agent is what they do, not where they are, they all use flume-node.