I am using Cassandra 2.0 with python CQL.
I have created a column family as follows:
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'DC1' : 1 };
USE Identification;
name varchar,
value varchar,
entity_id uuid,
PRIMARY KEY ((name, value), entity_id))
I then try to count the number of records in this CF as follows:
#!/usr/bin/env python
import argparse
import sys
import traceback
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
def count(host, cf):
keyspace = "identification"
cluster = Cluster([host], port=9042, control_connection_timeout=600000000)
session = cluster.connect(keyspace)
st = SimpleStatement("SELECT count(*) FROM %s" % cf, consistency_level=ConsistencyLevel.ALL)
for row in session.execute(st, timeout=600000000):
print "count for cf %s = %s " % (cf, str(row))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-cf", "--column-family", default="entitylookup", help="Column Family to query")
parser.add_argument("-H", "--host", default="localhost", help="Cassandra host")
args = parser.parse_args()
count(args.host, args.column_family)
print "fim"
The count is not that useful to me, it's just a test with an operation that takes long to complete.
Although I have defined timeout as 600000000 seconds, after less than 30 seconds I get the following error:
./count_entity_lookup.py -H localhost -cf entitylookup
Traceback (most recent call last):
File "./count_entity_lookup.py", line 27, in <module>
count(args.host, args.column_family)
File "./count_entity_lookup.py", line 16, in count
for row in session.execute(st, timeout=None):
File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1026, in execute
result = future.result(timeout)
File "/home/mvalle/pyenv0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2300, in result
raise self._final_exception
cassandra.ReadTimeout: code=1200 [Timeout during read request] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'data_retrieved': True, 'required_responses': 2, 'consistency': 5}
It seems the answer was found in just a replica, but this really doesn't make sense to me. Should't cassandra be able to query it anyway?
In the image bellow, it's possible to see that the amount of requests to the cluster was really low and the latency low as well. I am not sure why is this happening.
From the response:
received_responses': 1, 'data_retrieved': True, 'required_responses': 2
Data was only available on one node while the query is requiring consistency==all. Cassandra was not able to fulfill that request and timed out.
You may change the write consistency to 'ALL' if it is required that all nodes have the data.
That would ensure all read requests can be satisfied without consistency==ALL as that would be satisfied by the write request it self, though writes may fail if a node is off line.
See documentation for explanation of what each consistency level means.
is what would be used to ensure majority of nodes with respect to replication factor are contacted within a DC.