Search code examples
google-cloud-bigtablebigtable

Bigtable: mutating multiple values into single column in a single column family


I've tried to insert 100 values into a column test_col in column family test_cf in a row key test-123

The problem is that I successfully inserted 100 values into Bigtable.

However, The number of values in the test_col column in test_cf is less than 100 and It appears to be randomly inserted.

The code I wrote is below.

rows = []
values = ['123-123124-325324', '543-123-45324-123123', '292-123124-54324-234', '292-213123-123123123-3213']
 # ...  100 values in the list

row_key = 'test-123'.encode()
direct_row = table.direct_row(row_key)
for val in values:
    row.set_cell("test_cf",
                 "test_col".encode('utf-8'),
                 val,
                 datetime.utcnow())
     
    rows.append(row)

rtn = table.mutate_rows(rows)

for i, status in enumerate(rtn):
    if status.code != 0:
        print('ERROR')

And the weird thing is that the response status is always 0 for the 100 values mutation.


Solution

  • This code is not doing what you intended. Each set_cell call is writing to the same row ("test-123"), the same column family ("test_cf") and the same column qualifier ("test_col"). The value is different each time, but the timestamp associated with each value is the current time which could be the same across multiple set_cells. Because a single cell in Bigtable is indexed by the (row, family, column, timestamp) tuple, this code can overwrite data it wrote earlier in the loop.

    So, it is entirely possible that the first 3 set_cells look like this:

    row:       "test-123"
    family:    "test_cf"
    column:    "test_col"
    value:     "123-123124-325324"
    timestamp: t0
    
    row:       "test-123"
    family:    "test_cf"
    column:    "test_col"
    value:     "543-123-45324-123123"
    timestamp: t0
    
    row:       "test-123"
    family:    "test_cf"
    column:    "test_col"
    value:     "292-123124-54324-234"
    timestamp: t1
    

    In this case the second entry overwrites the first, as (row, family, column, timestamp) is identical.

    The status code will be 0 for successful calls, so that is working as expected.