Sometimes If I write multiple versions onto the same row key, and with multiple column families within multiple batched mutations (each version is batched together with multiple writes).
Is this expected behavior due to data compaction? Will the extra version be removed over time?
The issue here is that you're putting the two columns in two separate entries in the batch, which means that even if they have the same row they won't be applied atomically.
Batch entries can succeed or fail individually, and the client will then retry just the failed entries. If, for example, one entry succeeds and the other times out but later succeeds silently, a retry of the "failed" entry can lead to the partial write results you're seeing.
In python you should therefore do something like the following (adapted from cloud.google.com/bigtable/docs/samples-python-hello):
print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
# Note: This example uses sequential numeric IDs for simplicity,
# but this can result in poor performance in a production
# application. Since rows are stored in sorted order by key,
# sequential keys can result in poor distribution of operations
# across nodes.
#
# For more information about how to design a Bigtable schema for
# the best performance, see the documentation:
#
# https://cloud.google.com/bigtable/docs/schema-design
row_key = 'greeting{}'.format(i).encode()
row = table.row(row_key)
# **Multiple calls to 'set_cell()' are allowed on the same batch
# entry. Each entry will be applied atomically, but a separate
# 'row' in the same batch will be applied separately even if it
# shares its row key with another entry.**
row.set_cell(column_family_id,
column1,
value,
timestamp=datetime.datetime.utcnow())
row.set_cell(column_family_id,
column2,
value,
timestamp=datetime.datetime.utcnow())
rows.append(row)
table.mutate_rows(rows)