Search code examples
pythoncassandrascylla

Different batch limits for Cassandra and Scylla, Err - 2200, Batch too large


I used standard 'cassandra-driver' for access to Cassandra and Scylla for insert/udpate data via BatchStatement. I updated data via batch (~500 rows) and I got only for Cassandra this error (everything work fine for Scylla):

Error from server: code=2200 [Invalid query] message="Batch too large"

Python code, see:

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.cluster import ExecutionProfile
from cassandra.cluster import EXEC_PROFILE_DEFAULT
from cassandra.query import BatchStatement

...
cluster = Cluster(contact_points=["localhost"],
                  port=9042,
                  execution_profiles={EXEC_PROFILE_DEFAULT: profile},
                  control_connection_timeout=60,
                  idle_heartbeat_interval=60,
                  connect_timeout=60)
session = cluster.connect(keyspace="catalog")
...
insert_statement = session.prepare(f"INSERT INTO rn6.t01 ({columns}) VALUES ({items})")
batch = BatchStatement(consistency_level=ConsistencyLevel.ONE)

data_frm = pandas.DataFrame(generator.integers(999999, 
                size=(run_setup.bulk_row, run_setup.bulk_col)),
                columns=[f"fn{i}" for i in range(run_setup.bulk_col)])

# prepare data
for row in data_frm.values:
    batch.add(insert_statement, row)

session.execute(batch)

It seems that Cassandra and Scylla have different default limits for batch statement. Do you know these limits?


Solution

  • Both Scylla and Cassandra have a configurable batch-size limit, batch_size_fail_threshold_in_kb. But its default value is different: Scylla's default is 1024, whereas Cassandra's default it 50.

    You can try increasing it in Cassandra's configuration - or use smaller batches.