While I'm able to store Flume data (from Kafka) in HDFS correctly, I have no luck getting them stored in HBase... The platform is Cloudera 5.10.1.
My flume conf is:
tier1.sources = source1
tier1.channels = channel1
#tier1.sinks = hdfs1
tier1.sinks = hbase1
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.zookeeperConnect = master3d.localdomain:2181
tier1.sources.source1.topics.regex = application.data.*
tier1.sources.source1.channels = channel1
tier1.sources.source1.interceptors = i1
tier1.sources.source1.interceptors.i1.type = timestamp
tier1.sources.source1.kafka.consumer.timeout.ms = 100
tier1.sources.source1.kafka.consumer.group.id = flume
tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.hbase1.type = hbase
tier1.sinks.hbase1.table = application_data
tier1.sinks.hbase1.columnFamily = json
tier1.sinks.hbase1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
tier1.sinks.hbase1.channel = channel1
#tier1.sinks.hdfs1.type = hdfs
#tier1.sinks.hdfs1.hdfs.path = /tmp/kafka/%{topic}/%y-%m-%d
#tier1.sinks.hdfs1.hdfs.rollInterval = 5
#tier1.sinks.hdfs1.hdfs.rollSize = 0
#tier1.sinks.hdfs1.hdfs.rollCount = 0
#tier1.sinks.hdfs1.hdfs.fileType = DataStream
#tier1.sinks.hdfs1.channel = channel1
I have created the Hbase table in a following way:
hbase(main):005:0> create 'application_data', 'json'
0 row(s) in 1.2250 seconds
But the scan command on this table returns always
hbase(main):021:0> scan 'application_data'
ROW COLUMN+CELL
0 row(s) in 0.0100 seconds
I have put Flume and Hbase Master in DEBUG, but I see no error nor warning. I can see Flume user gets a connection in Hbase and checking the existence of the table. There's no Kerberos auth on HBase. Kafka topics do have data, because I have just double-checked with console consumer and with the hdfs sink.
I was just wondering if anybody can see an error here or point me in right direction. I think I'm not doing anything strange here.
Thank you!
Since there are no errors in your flume log make sure all sinks, sources, and channels are initialized. Sometimes you can miss that messages in the log, and in that case no exceptions or errors are reported.