Say I have:
cur.execute("CREATE TABLE data_by_year ( device_id int, \
site_id text, year_id int, event_time timestamp, value float, \
PRIMARY KEY ((device_id, site_id, year_id),event_time))")
And I want to query all devices for years 2014 and 2013.
result=cur.execute("select distinct device_id, site_id, year_id,\
from data_by_year where device_id IN (324535, 32453l),\
and site_id in and year_id IN (2014)")
Obvously this statement has many issues but it's the best example I could come up with. My beef is with the "where device_id IN (324535, 32453l)". In reality I will not know all the various devices so I want to grab them "ALL". How do I do this?
I'm dealing with time series minute data so I felt that one year was a reasonable partition.
knifewine's answer is correct, but if you're going to be executing this query frequently (and want good performance), I suggest using a second table:
CREATE TABLE all_device_data_by_year (
site_id text,
year_id int,
device_id int,
event_time timestamp,
value float,
PRIMARY KEY ((site_id, year_id), device_id, event_time)
)
You might want to partition by day/month instead of year, depending on the number of devices.
Regarding automatic query paging support in the python driver, it's available right now in the 2.0 branch. I should have a 2.0-beta release ready soon.