Search code examples
cassandraamazon-keyspacesdatastax-python-driver

Why does my query to AWS Keyspaces using the Cassandra Python driver return an empty list?


I am trying to use a prepared statement to query keyspaces via python's cassandra driver.

This is the query and it's preparation...

from cassandra.cluster import Cluster

cluster = Cluster(
    **conn_details
)
            
session = cluster.connect("mykeyspace")

query = ("SELECT timestamp "  
        "FROM mykeyspace.mytable "
        "WHERE t_id='123' "
        "AND p_id='321' "
        "AND timestamp IN ? "
        )

prepared_statement = session.prepare(query)

session.execute(prepared_statement, parameters=[ (1677145736507, 1677145728972) ]).current_rows

The output is an empty list. There is some issue with the statement binding as I'm able to run the CQL IN with success in three scenarios below... i.e. if I run the below raw query via session.execute(<raw query string>) I can get a response..

SELECT timestamp
FROM mykeyspace.mytable 
WHERE t_id='123'
AND p_id='321' 
AND collection_event_timestamp IN (1677145728972, 1677145736507) 

If I run inside the keyspaces query editor on AWS I get the expected response

enter image description here

The only way I can get IN to work with any parameterisation is via string formatting..

id_tuples = (1677145736507, 1677145728972)

query = "SELECT timestamp FROM mykeyspace.mytable WHERE t_id='123' AND p_id='321' AND timestamp IN %s  "

session.execute(query, parameters=[ValueSequence(id_tuples)]).current_rows

enter image description here

Does anyone have any advice as to what is going wrong here? Why is the prepared statement approach not working?


Solution

  • When creating a prepared statement in Python using the IN operator, you'll need to define your variable as a dictionary instead of a tuple:

    pStatement = session.prepare("""
        SELECT * FROM sales.emp WHERE empid IN ?;
    """)
    
    employees = {99, 68}
    
    rows = session.execute(pStatement,[employees])
    print("Employee data:\n")
    for row in rows:
        print(row)
    
    
    % python testDBIN.py
    
    Employee data:
    
    Row(empid=99, first_name='Wayne', last_name='Gretzky')
    Row(empid=68, first_name='Jaromir', last_name='Jagr')
    

    Does IN only work on partition columns? I'm trying to filter on a clustering column

    Ok, so that means that your're trying to run a query which the table wasn't built to support. The IN operator does work on both partition and clustering columns.

    But, it only works on clustering columns IF all preceeding keys are also specified. I don't see your table definition above, but I do see that you're filtering on t_id and p_id. No idea if those are the preceeding keys in the PRIMARY KEY definition or not.

    The other possibility, is that perhaps this is one of those areas where Amazon Keyspaces != Apache Cassandra. Seeing syntax and programatic approaches that work in Cassandra but fail in Keyspaces is quite common. Unfortunately, the two are not 100% compatible.