Search code examples
hadoopflume

Flume - Tiering data flows using the Avro Source and Sink


I'm attempting to set up a simple tiered data flow using an Avro source/sink between two agents on different machines.

The first agent on the vm-host-01 node (called "agent") has a netcat source, a memorychannel, and an avro sink.

The second agent on the vm-host-02 node (called "collector" has a avro source, a memory channel, and an hdfs sink.

Here is the config for the first agent "agent".

agent.sources=s1
agent.channels=c1
agent.sinks=k1

agent.sources.s1.type=netcat
agent.sources.s1.channels=c1
agent.sources.s1.bind=vm-host-01
agent.sources.s1.port=12345

agent.channels.c1.type=memory

agent.sinks.k1.type=avro
agent.sinks.k1.channel=c1
agent.sinks.k1.hostname=vm-host-02
agent.sinks.k1.port=42424

Here is the config for the second agent "collector" on the second machine:

collector.sources=av1
collector.channels=c1
collector.sinks=k1

collector.sources.av1.type=avro
collector.sources.av1.bind=vm-host-02
collector.sources.av1.port=42424
collector.sources.av1.channels=c1

collector.channels.c1.type=memory

collecor.sinks.k1.type=hdfs
collecor.sinks.k1.hdfs.path=/user/root/flume/mydata
collecor.sinks.k1.hdfs.fileType=DataStream
collecor.sinks.k1.hdfs.writeType=text
collecor.sinks.k1.hdfs.filePrefix=Hello
collecor.sinks.k1.hdfs.fileSuffix=.txt
collecor.sinks.k1.channel=c1

Now when I telnet into the first host (vm-host-01) and enter some strings, the command prompt for the first agent doesn't even change. (Neither does the command prompt for the second host).

If I edit the config for "agent" and change its sink to hdfs, I can telnet in, enter a string, see the command prompt acknolwedge this and it is wrote to HDFS.

Just adding the avro sink seems to disable its netcat source from accepting input.


Solution

  • Oops, I mispelled "collector" as "collecor".