We have a script which talks to Scylla (a cassandra drop-in replacement). The script is supposed to run for a few thusands systems. The script runs a few thousands queries to get its required data. However, after sometime the script gets crashed throwing this error:
2021-09-29 12:13:48 Could not execute query because of : errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
2021-09-29 12:13:48 Trying for : 4th time
Traceback (most recent call last):
File ".../db_base.py", line 92, in db_base
ret_val = SESSION.execute(query)
File "cassandra/cluster.py", line 2171, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 4062, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
The DB Connection code:
def db_base(current_keyspace, query, try_for_times, current_IPs, port):
global SESSION
if SESSION is None:
# This logic to ensure given number of retrying runs on failure of connecting to the Cluster
for i in range(try_for_times):
try:
cluster = Cluster(contact_points = current_IPs, port=port)
session = cluster.connect() # error can be encountered in this command
break
except NoHostAvailable:
print("No Host Available! Trying for : " + str(i) + "th time")
if i == try_for_times - 1:
# shutting down cluster
cluster.shutdown()
raise db_connection_error("Could not connect to the cluster even in " + str(try_for_times) + " tries! Exiting")
SESSION = session
# This logic to ensure given number of retrying runs in the case of failing the actual query
for i in range(try_for_times):
try:
# setting keyspace
SESSION.set_keyspace(current_keyspace)
# execute actual query - error can be encountered in this
ret_val = SESSION.execute(query)
break
except Exception as e:
print("Could not execute query because of : " + str(e))
print("Trying for : " + str(i) + "th time")
if i == (try_for_times -1):
# shutting down session and cluster
cluster.shutdown()
session.shutdown()
raise db_connection_error("Could not execute query even in " + str(try_for_times) + " tries! Exiting")
return ret_val
How can this code be improved to sustain and be able to run for this large no. of queries? Or we should look into other tools / approach to help us with getting this data? Thank you
The Client session timeout indicates that the driver is timing out before the server does or - should it be overloaded - that Scylla hasn't replied back the timeout to the driver. There are a couple of ways to figure this out:
1 - Ensure that your default_timeout is higher than Scylla enforced timeouts in /etc/scylla/scylla.yaml
2 - Check the Scylla logs for any sign of overload. If there is, consider throttling your requests to find a balanced sweet spot to ensure they no longer fail. If it continues, consider resizing your instances.
In addition to these, it is worth to mention that your sample code is not using PreparedStatements, TokenAwareness and other best practices as mentioned under https://docs.datastax.com/en/developer/python-driver/3.19/api/cassandra/policies/ that will certainly improve your overall throughput down the road.
You can find further information on Scylla docs: https://docs.scylladb.com/using-scylla/drivers/cql-drivers/scylla-python-driver/ and Scylla University https://university.scylladb.com/courses/using-scylla-drivers/lessons/coding-with-python/