Search code examples
hbaseflume

Can I customize the row key pattern when using HBaseSink in Flume NG?


I'm trying to collecting logs in a text file to HBase using Flume NG. Each row in the log file is JSON text, and I'm inserting them as values in HBase table. The problem is I have no idea how to customize the row key pattern. For example, when I count rows in a table:

hbase(main):001:0> count 'flume-ng-test', 100000
Current count: 100000, row: default32e473e0-4f54-48b5-8081-c3f845b38456         
Current count: 200000, row: default65b0cc3d-5421-4bb1-87e2-b21c2841fcd6         
Current count: 300000, row: default98be85e3-bb9f-402e-8f36-0db74cb8ab44         
Current count: 400000, row: defaultcbf888dc-e2bb-492f-ab17-63f5e0327344         
Current count: 500000, row: defaultfedc40e5-04b4-49a4-8734-655f43956d6e         
502224 row(s) in 8.1540 seconds

Row keys are default + UUID_like_string. If I want to change the row key pattern using current timestamp (ascending or descending), what should I do?

Thanks for any comments.


Solution

  • In flume-ng's HBase sink, HbaseEventSerializer implementation is responsible for generating row keys. The default implementation org.apache.flume.sink.hbase.SimpleHbaseEventSerializer already supports generating timestamp row keys in the format prefix + current timestamp, to use it just modify your flume configuration accordinly.

    hbase-agent.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
    hbase-agent.sinks.sink1.channel = ch1
    hbase-agent.sinks.sink1.table = demo
    hbase-agent.sinks.sink1.columnFamily = cf
    hbase-agent.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
    hbase-agent.sinks.sink1.serializer.payloadColumn = col1
    hbase-agent.sinks.sink1.serializer.keyType = timestamp
    

    If the provided timestamp based key generation method is not what you are after then you'll need to provide a custom HbaseEventSerializer implementation to flume, which will require you to:

    1. Create your own row key generator class (the default one is org.apache.flume.sink.hbase.SimpleRowKeyGenerator)
    2. Create your own implementation of the HbaseEventSerializer interface (default implementation is org.apache.flume.sink.hbase.SimpleHbaseEventSerializer) which will use the custom row key generator you created in first step
    3. Modify your flume hbase sink configuration to use the custom HbaseEventSerializer implementation.