Using Python Cassandra Driver for large no. of Queries

We have a script which talks to Scylla (a cassandra drop-in replacement). The script is supposed to run for a few thusands systems. The script runs a few thousands queries to get its required data. However, after sometime the script gets crashed throwing this error:

2021-09-29 12:13:48 Could not execute query because of : errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x

2021-09-29 12:13:48 Trying for : 4th time

Traceback (most recent call last):
  File ".../db_base.py", line 92, in db_base
    ret_val = SESSION.execute(query)
  File "cassandra/cluster.py", line 2171, in cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 4062, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x

The DB Connection code:

def db_base(current_keyspace, query, try_for_times, current_IPs, port):

    global SESSION

    if SESSION is None:

        # This logic to ensure given number of retrying runs on failure of connecting to the Cluster
        for i in range(try_for_times):
            try:

                cluster = Cluster(contact_points = current_IPs, port=port)
                session = cluster.connect() # error can be encountered in this command
                break

            except NoHostAvailable:
                print("No Host Available! Trying for : " + str(i) + "th time")
                if i == try_for_times - 1:

                    # shutting down cluster
                    cluster.shutdown()

                    raise db_connection_error("Could not connect to the cluster even in " + str(try_for_times) + " tries! Exiting")

        SESSION = session

    # This logic to ensure given number of retrying runs in the case of failing the actual query
    for i in range(try_for_times):

        try:

            # setting keyspace
            SESSION.set_keyspace(current_keyspace)
            # execute actual query - error can be encountered in this
            ret_val = SESSION.execute(query)
            break

        except Exception as e:

            print("Could not execute query because of : " + str(e))
            print("Trying for : " + str(i) + "th time")

            if i == (try_for_times -1):

                # shutting down session and cluster
                cluster.shutdown()
                session.shutdown()
                raise db_connection_error("Could not execute query even in " + str(try_for_times) + " tries! Exiting")

    return ret_val

How can this code be improved to sustain and be able to run for this large no. of queries? Or we should look into other tools / approach to help us with getting this data? Thank you

Solution

The Client session timeout indicates that the driver is timing out before the server does or - should it be overloaded - that Scylla hasn't replied back the timeout to the driver. There are a couple of ways to figure this out:

1 - Ensure that your default_timeout is higher than Scylla enforced timeouts in /etc/scylla/scylla.yaml

2 - Check the Scylla logs for any sign of overload. If there is, consider throttling your requests to find a balanced sweet spot to ensure they no longer fail. If it continues, consider resizing your instances.

In addition to these, it is worth to mention that your sample code is not using PreparedStatements, TokenAwareness and other best practices as mentioned under https://docs.datastax.com/en/developer/python-driver/3.19/api/cassandra/policies/ that will certainly improve your overall throughput down the road.

You can find further information on Scylla docs: https://docs.scylladb.com/using-scylla/drivers/cql-drivers/scylla-python-driver/ and Scylla University https://university.scylladb.com/courses/using-scylla-drivers/lessons/coding-with-python/