Search code examples
hivehbaseprotocol-buffershive-serdeprotobuf-java

How to deserialize the ProtoBuf serialized HBase columns in Hive?


I have used ProtoBuf's to serialize the class and store in HBase Columns. I want to reduce the number of Map Reduce jobs for simple aggregations, so I need SQL like tool to query the data. If I use Hive, Is it possible to extend the HBaseStorageHandler and write our own Serde for each Table? Or any other good solution to is available.

Updated:

I created the HBase table as

create 'hive:users' , 'i'

and inserted user data from java api,

 public static final byte[] INFO_FAMILY = Bytes.toBytes("i");
 private static final byte[] USER_COL = Bytes.toBytes(0);
 public Put mkPut(User u)
    {
        Put p = new Put(Bytes.toBytes(u.userid));
        p.addColumn(INFO_FAMILY, USER_COL, UserConverter.fromDomainToProto(u).toByteArray());
        return p;
    } 

my scan gave results as:

hbase(main):016:0> scan 'hive:users'
ROW                                COLUMN+CELL
 kim123                            column=i:\x00, timestamp=1521409843085, value=\x0A\x06kim123\x12\x06kimkim\x1A\[email protected]
1 row(s) in 0.0340 seconds

When I query the table in Hive, I don't see any records. Here is the command I used to create table.

create external table users(userid binary, userobj binary) 
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
with serdeproperties("hbase.columns.mapping" = ":key, i:0", "hbase.table.default.storage.type" = "binary") 
tblproperties("hbase.table.name" = "hive:users");

when I query the hive table I don't see the record inserted from hbase.

Can you please tell me what is wrong here?


Solution

  • You could try writing a UDF which would take binary protobuf and convert it to some readable structure (comma separated or json). You would have to make sure to map values as binary data.