Search code examples
pythoncassandracassandra-driver

Approach to resolve Cassandra Coordinator node timeouts on writes


I have a simple one node Cassandra cluster with basic keyspace configuration that has replication_factor=1

In this keyspace, we have about 230 tables. Each table has roughly 40 columns. The writes we do to these tables are at roughly the rate of 30k writes in five minutes just once a day. I have about 6 python workers scripts that make these writes to any one table at a time and they will all continue making these writes till all 230 tables are written to for the day. The scripts use the python cassandra-driver with a simple session to make these writes. As far as the data being written here, a lot of them are nulls.

Effectively, if I am right, this can be thought of as 6 concurrent connection making 30k+ entries in five minutes per day.

I understand how cassandra writes and deletes work and am familiar with coordinator nodes etc. I am observing a traceback that occurs intermittently as described below:

"cassandra/cluster.py", line 2030, in cassandra.cluster.Session.execute (cassandra/cluster.c:38536)
app_nstablebuilder.1.69j772led82k@swarm-worker-gg37    |   File "cassandra/cluster.py", line 3844, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:80834)
app_nstablebuilder.1.69j772led82k@swarm-worker-gg37    | cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0}

My question has to do with how to approach solving this problem. I am unable to verify whether the problem has come out of my workers' scripts or with the Cassandra cluster itself. Should I be slowing down my workers in doing their writes? Should I run some sort of diagnostic to improve Cassandra performance?

All the solutions I have read till now have to do with multinode clusters and I couldn't find one for a single node cluster.

I feel like our cluster is unhealthy and that my efforts should be targetted in fixes there. If so, I'm unsure of where to begin. Could anyone point me in the right direction?

If there's any further information I could provide to help, do let me know.


Solution

  • Inserting nulls will create tombstones. Excluding the null columns from the query will not create tombstones. You can read a little bit on that matter here. I'm not sure that inserting nulls may cause this, but inserting nulls (that would create tombstones) is definitely an improvement to take into account.