I'm trying to set up a basic Kafka-Flume-HDFS pipeline. Kafka is up and running but when I start the flume agent via
bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -D flume.root.logger=INFO,console
it seems like the agent isn't coming up as the only console log I get is:
Info: Sourcing environment configuration script /opt/hadoop/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/jdk1.8.0_111/bin/java -Xmx20m -D -cp '/opt/hadoop/flume/conf:/opt/hadoop/flume/lib/*:/opt/hadoop/flume/lib/:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n flume1 -f conf/flume-conf.properties flume.root.logger=INFO,console
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
The flume config file:
flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = localhost:2181
flume1.sources.kafka-source-1.topic = twitter_topic
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /tmp/kafka/twitter_topic/%y-%m-%d
flume1.sinks.hdfs-sink-1.hdfs.rollCount= 100
flume1.sinks.hdfs-sink-1.hdfs.rollSize= 0
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 1000
Is this a configuration problem in flume-conf.properties
or am I missing something important?
EDIT
After restarting everything it seems to work better than before, Flume is actually doing something now (it seems like the order is important when starting hdfs, zookeeper, kafka, flume and my streaming application). I now get an exception from flume
java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.isFileClosed(org.apache.hadoop.fs.path)
...
Edit the hdfs.path
value with the full HDFS URI,
flume1.sinks.hdfs-sink-1.hdfs.path = hdfs://namenode_host:port/tmp/kafka/twitter_topic/%y-%m-%d
For the logs:
The logs are not being printed on the console, remove the whitespace between -D
and flume.root.logger=INFO,console
.
Try,
bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -Dflume.root.logger=INFO,console
Or access the logs from $FLUME_HOME/logs/
directory.