I have a Flume agent writing tweets to a HBase sink.
After a few seconds, transactions to the sink are failing and every 8-10 seconds I get these error messages in the Flume agent log telling me the transaction to HBase is failing.
The strange thing is that some tweets still get through and go into the HBase table. What could be causing this? This is running on a single node Cloudera Quickstart VM, could it be a problem with resources?
This is the agent log
9:20:44.618 PM ERROR org.apache.flume.SinkRunner
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
9:20:53.883 PM ERROR org.apache.flume.SinkRunner
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
These are some strange things in the debug log, maybe related?
2014-03-06 09:39:12,069 DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient: Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
2014-03-06 09:39:12,298 DEBUG org.apache.zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x144965080900029 : Unable to read additional data from server sessionid 0x144965080900029, likely server has closed socket
This is my agent configuration
TwitterAgent.sinks.HBaseTweet.channel = MemChannel
TwitterAgent.sinks.HBaseTweet.type = org.apache.flume.sink.hbase.AsyncHBaseSink
TwitterAgent.sinks.HBaseTweet.table = tweets
TwitterAgent.sinks.HBaseTweet.columnFamily = tweet
TwitterAgent.sinks.HBaseTweet.batchSize = 100
TwitterAgent.sinks.HBaseTweet.serializer = flume_hdfs.hbase.util.AsyncHbaseTwitterEventSerializer
TwitterAgent.sinks.HBaseTweet.serializer.columns = tweet:id,tweet:created_at,tweet:source,tweet:favourited,tweet:text
TwitterAgent.sinks.HBaseTweet.serializer.delimiter = \\t
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 200
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Some metrics from the log when stopping the agent, might be interesting
Component type: CHANNEL, name: MemChannel stopped
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.start.time == 1394093630078
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.stop.time == 1394093894804
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.capacity == 200
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.current.size == 125
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.attempt == 220
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.success == 209
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.attempt == 3059
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.success == 9
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
Component type: SINK, name: HBaseTweet stopped
Shutdown Metric for type: SINK, name: HBaseTweet. sink.start.time == 1394093630407
Shutdown Metric for type: SINK, name: HBaseTweet. sink.stop.time == 1394093894833
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.complete == 27
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.empty == 0
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.underflow == 7
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.closed.count == 1
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.creation.count == 1
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.failed.count == 0
Shutdown Metric for type: SINK, name: HBaseTweet. sink.event.drain.attempt == 3053
Shutdown Metric for type: SINK, name: HBaseTweet. sink.event.drain.sucess == 9
HBase Regionserver error
2014-03-08 09:37:44,371 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family retweet does not exist in region tweets,,1394029330397.953f602dd0790637df8106720396f219. in table 'tweets', {NAME => 'entities', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'retweeted_status', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'tweet', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'user', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:5475)
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:3022)
at org.apache.hadoop.hbase.regionserver.HRegion.internalPut(HRegion.java:2900)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:2239)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:323)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
The error message from the HBase log suggests that there is a schema mismatch, in particular the agent expects there to be a column family named retweet
, whereas the schema actually specifies retweeted_status
.
The solution is either to recompile the agent to use the correct column family name, or change the schema to use the name expected by the agent. I don't know what fix is more correct; if you defined this schema on your own, then most likely you can just change the column family name. But if the schema was defined externally (i.e.: by some script or by following specific instructions from somewhere), renaming a column family may break something else that depends on the name being retweeted_status
. In that case, the source code of Twitter_HBase_Impala should be fixed to use the correct name.