I've got two nodes that are fully replicated. When I run a query on a table that contains 30 rows, cqlsh trace seems to indicate it is fetching some rows from one server and some rows from the other server.
So even though all the rows are available on both nodes, the query takes 250ms+ rather than 1ms for other queries.
I've already got consistency level set to "one" at the protocol level, what else do you have to do to make it only use one node for the query?
select * from organisation:
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+--------------+----------------
execute_cql3_query | 04:21:03,641 | 10.1.0.84 | 0
Parsing select * from organisation LIMIT 10000; | 04:21:03,641 | 10.1.0.84 | 68
Preparing statement | 04:21:03,641 | 10.1.0.84 | 174
Determining replicas to query | 04:21:03,642 | 10.1.0.84 | 307
Enqueuing request to /10.1.0.85 | 04:21:03,642 | 10.1.0.84 | 1034
Sending message to /10.1.0.85 | 04:21:03,643 | 10.1.0.84 | 1402
Message received from /10.1.0.84 | 04:21:03,644 | 10.1.0.85 | 47
Executing seq scan across 0 sstables for [min(-9223372036854775808), min(-9223372036854775808)] | 04:21:03,644 | 10.1.0.85 | 461
Read 1 live and 0 tombstoned cells | 04:21:03,644 | 10.1.0.85 | 560
Read 1 live and 0 tombstoned cells | 04:21:03,644 | 10.1.0.85 | 611
………..etc….....
It turns out that there was a bug in Cassandra versions 2.0.5-2.0.9 that would make Cassandra more likely to request data on two nodes when it only needed to talk to one.
Upgrading to 2.0.10 or greater resolves this problem.
Refer: CASSANDRA-7535