Query Cassandra with Both Primary Key and Secondary Key Constraints

I have a table in Cassandra defined as

CREATE TABLE foo ("A" text, "B" text, "C" text,
    "D" text, "E" text, "F" text,
    PRMIARY KEY ("A", "B"),
    INDEX ("C"))

I inserted billions of records into this table. And now I want to query the table with CQL

SELECT * FROM foo WHERE "A"='abc' AND "B"='def' AND "C"='ghi'

I keep receiving 1200 error saying that

ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

After googling, I suspect the reason of this error is that the query is directed to some partitions that does not hold any data.

My questions are

Is there any constraint querying CQL with both primary key and secondary key specified?
If I specified the partition key in my CQL, here "A"='abc' (correct me if wrong), why C* still tries other partition that apparently does not hold the data?
Any hints to solve this timeout problem?

Thank you!

Solution

Note: For my examples, I got rid of the double-quotes around the column names. It really doesn't do anything other than preserve case in the column names (not the values) and only just serves to muck-up the works.

Is there any constraint querying CQL with both primary key and secondary key specified?

First of all, I need to clear-up what, exactly, your "primary key" and "secondary key" are. If you are referring to C as a "secondary key," then "yes" you can, with some restrictions. If you mean your partition key (A) and your cluster key (B), then yes, you can.

Querying by your partition and clustering keys (or even just your partition key(s) works:

aploetz@cqlsh:stackoverflow2> SELECT * FROM foo WHERe A='abc' AND B='def';

 a   | b   | c   | d   | e   | f
-----+-----+-----+-----+-----+-----
 abc | def | ghi | jkl | mno | pqr

(1 rows)
aploetz@cqlsh:stackoverflow2> SELECT * FROM foo WHERe A='abc';

 a   | b   | c   | d   | e   | f
-----+-----+-----+-----+-----+-----
 abc | ddd | ghi | jkl | mno | pqr
 abc | def | ghi | jkl | mno | pqr

(2 rows)

When I create your table and index, insert a few rows, and run your query:

aploetz@cqlsh:stackoverflow2> SELECT * FROM foo WHERE A='abc' AND B='def' AND C='ghi';

 a   | b   | c   | d   | e   | f
-----+-----+-----+-----+-----+-----
 abc | def | ghi | jkl | mno | pqr

(1 rows)

That works.

If I specified the partition key in my CQL, here "A"='abc' (correct me if wrong), why C* still tries other partition that apparently does not hold the data?

I don't believe that is the problem. You are restricting it to a single partition, so it should only query data off of the abc partition.

I inserted billions of records into this table.

What you are seeing, is the reason that secondary index usage is considered to be an "anti-pattern" in Cassandra. Secondary indexes do not work the same way that they do in the relational world. They just do not scale well to large clusters or data sets.

Any hints to solve this timeout problem?

Yes. Recreate your table with C as a second clustering key. And do not create an index on C.

CREATE TABLE foo (A text, B text, C text, D text, E text, F text,
  PRMIARY KEY (A, B, C));

Reload your data, and then this should work for you:

aploetz@cqlsh:stackoverflow2> SELECT * FROM foo WHERE A='abc' AND B='def' AND C='ghi';

Not only should it work, but it should not timeout and it should be fast.