Search code examples
pythonhadoophbasethrift

mutateRow() in HBase through Thrift requires undocumented fourth argument


When I try to do an insert/update to HBase via Thrift (Python, specifically), mutateRow() requires a fourth argument "attributes". Thrift says that this column is a string->string map. None of the examples and online discussions mention this fourth column, and even the Thrift examples provided with the same, exact version of HBase don't have it.

If you can, please just include a full example of creating a table, defining a column family, inserting a row, and dumping the data.


Solution

  • No problem. Also, instead of just dumping the value of the created column, I actually dump the last three versions of the modified column, just because its cool.

    For completeness, I, roughly, did the following to get Thrift working:

    • Downloaded and built Thrift (using SVN.. 2012-11-15/1429368).
    • Ran "thrift -gen py <thrift file>" from the path that I wanted the Python interface files created in.
    • Installed "thrift" package via PIP.

    I ran the the following code from the root of the generated files.

    from thrift.transport import TSocket
    from thrift.transport import TTransport
    from thrift.protocol import TBinaryProtocol
    
    from hbase import Hbase
    from hbase.ttypes import *
    
    from random import randrange
    from pprint import pprint
    
    socket = TSocket.TSocket('localhost', 9090)
    transport = TTransport.TBufferedTransport(socket)
    transport.open()
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = Hbase.Client(protocol)
    
    table_name = 'test_table'
    row_key = 'test_row1'
    colfamily1 = 'test_colfamily1'
    column1 = 'test_col1'
    fullcol1 = ('%s:%s' % (colfamily1, column1))
    value = ('%d' % randrange(1000, 9999))
    
    num_versions = 3
    
    try:
        desc = ColumnDescriptor(colfamily1)
        client.createTable(table_name, [desc])
    except AlreadyExists:
        pass
    
    client.mutateRow(table_name, row_key, [Mutation(column=fullcol1, value=value)], {})
    results = client.getVer(table_name, row_key, fullcol1, num_versions, {})
    
    pprint(results)
    

    Output:

    $ python test.py 
    [TCell(timestamp=1357463438825L, value='9842')]
    $ python test.py 
    [TCell(timestamp=1357463439700L, value='9166'),
     TCell(timestamp=1357463438825L, value='9842')]
    $ python test.py 
    [TCell(timestamp=1357463440359L, value='2978'),
     TCell(timestamp=1357463439700L, value='9166'),
     TCell(timestamp=1357463438825L, value='9842')]