Search code examples
hadoopapache-kafkaflumeflume-ng

Why is my Flume agent not starting?


I'm trying to set up a basic Kafka-Flume-HDFS pipeline. Kafka is up and running but when I start the flume agent via

bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -D flume.root.logger=INFO,console

it seems like the agent isn't coming up as the only console log I get is:

Info: Sourcing environment configuration script /opt/hadoop/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/jdk1.8.0_111/bin/java -Xmx20m -D -cp '/opt/hadoop/flume/conf:/opt/hadoop/flume/lib/*:/opt/hadoop/flume/lib/:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n flume1 -f conf/flume-conf.properties flume.root.logger=INFO,console
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

The flume config file:

flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = localhost:2181
flume1.sources.kafka-source-1.topic = twitter_topic
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hdfs-channel-1

flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /tmp/kafka/twitter_topic/%y-%m-%d
flume1.sinks.hdfs-sink-1.hdfs.rollCount= 100
flume1.sinks.hdfs-sink-1.hdfs.rollSize= 0
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 1000

Is this a configuration problem in flume-conf.properties or am I missing something important?

EDIT

After restarting everything it seems to work better than before, Flume is actually doing something now (it seems like the order is important when starting hdfs, zookeeper, kafka, flume and my streaming application). I now get an exception from flume

java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.isFileClosed(org.apache.hadoop.fs.path)
...

Solution

  • Edit the hdfs.path value with the full HDFS URI,

    flume1.sinks.hdfs-sink-1.hdfs.path = hdfs://namenode_host:port/tmp/kafka/twitter_topic/%y-%m-%d
    

    For the logs: The logs are not being printed on the console, remove the whitespace between -D and flume.root.logger=INFO,console.

    Try,

    bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -Dflume.root.logger=INFO,console
    

    Or access the logs from $FLUME_HOME/logs/ directory.