Search code examples
pythoncassandrapycassa

pycassa - Remove multiple rows by their secondary index?


I have a column family with a secondary index 'pointer'. How do I remove multiple rows that have the same 'pointer' value (e.g. abc)?

The only option I know is:

expr = create_index_expression('pointer', 'abc')
clause = create_index_clause([expr])
for key, user in cassandra_cf.get_indexed_slices(clause):
    cassandra_cf.remove(key)

but I know this is very inefficient and can take long if I have thousands of rows with the same 'pointer' value. Are there any other options?


Solution

  • You can remove multiple rows at once:

    expr = create_index_expression('pointer', 'abc')
    clause = create_index_clause([expr])
    with cassandra_cf.batch() as b:
        for key, user in cassandra_cf.get_indexed_slices(clause):
            b.remove(key)
    

    This will group the removes into batches of 100 (by default). When the batch object is used as a context manager as it is here, it will automatically handle sending any remaining mutations once the with block is left.

    You can read more about this in the pycassa.batch API docs.