Search code examples
hbase

HBase "two columns in one" feature


The following book implies that there is a way to put two columns in one without using column families. Is it an actual HBase feature or is it just a dev hack like "concatenate two values in one column before sending them to Hbase and I will remember that it is in fact two columns I put there"-hack? If this is a feature, what is the syntax for that?

"Hadoop Application Architectures by Mark Grover, Ted Malaska, Jonathan Seidman, and Gwen Shapira (O’Reilly)." :

[When setting two columns foo and bar to a record,] each logical record in the HBase table will have two rows in the HBase HFile format. Here is the structure of such an HFile on disk:

|RowKey |TimeStamp  |Column |Value
|101    |1395531114 |F      |A1
|101    |1395531114 |B      |B1

The alternative choice is to have both the values from foo and bar in the same HBase column. This would apply to all records of the table and bears the following characteristics:

  • Both the columns would be retrieved at the same time. You may choose to disregard the value of the other column if you don’t need it.
  • Both the column values would need to be updated together since they are stored as a single entity (column).
  • Both the columns would age out together based on the last update.

Here is the structure of the HFile in such a case:

|RowKey |TimeStamp  |Column |Value
|101    |1395531114 |X      |A1|B1

I think it is different from putting multiple values in one columns as seen in HBase storing data for a particular column with 2 or more values for the same row-key in Scala/Java API and see them as "versions" of the value because here he speaks about foo and bar being two different columns with two different roles. I found no mention of such a feature in the Hbase documentation https://hbase.apache.org/book.html#schema.


Solution

  • I think you can do this using value arrays for HBase value. After getting array value, you should split and use. I don't think there is another way to store multiple values in a single column family.