I am trying to model a cassandra data set for a time series where I capture the sum of bytes over a minute for a given user, using a counter column. Using CQL3 I have this create table syntax:
CREATE TABLE minute_usr (
min varchar,
usr varchar,
bytes counter,
PRIMARY KEY (min, usr)
)
I am able to update the column using this query:
UPDATE minute_usr SET bytes = bytes + 200 WHERE usr = 'testuser' AND min = '1369448220';
however I would like to now get back the value of 'bytes' for a range of minutes where usr = 'username':
select min, bytes from minute_usr WHERE usr = 'testuser' AND min >= '1369448160' and min <= '1369448220';
and I am getting back the error:
Bad Request: Only EQ and IN relation are supported on the partition key for random partitioners (unless you use the token() function)
I was under the assumption that CQL was creating a composite key out of 'min' and 'usr' when creating the table allowing me to get a range of composite keys. The cli is actually telling me that the key is actually just the value of 'min':
[default@data_use] list minute_usr;
Using default limit of 100
Using default column limit of 100
RowKey: 1369448220
=> (column=testuser:, value=, timestamp=1371066133370000)
=> (counter=testuser:bytes, value=1400)
RowKey: 1369448160
=> (column=testuser:, value=, timestamp=1371066138506000)
=> (counter=testuser:bytes, value=1600)
Am I simply limited to doing an individual query for each timestamp / username combination with this data model?
I have also tried modeling the data with the username as the first part of the key, which allows me to do range queries
CREATE TABLE usr_minute (
min varchar,
usr varchar,
bytes counter,
PRIMARY KEY (usr, min)
)
And then I could do a range query on the column names for minutes like so:
`select bytes from usr_minute WHERE usr = 'testuser' AND min >= '1369448160' and min <= '1369448220';`
however I know that these values are now being stored in a single row which has limitations in terms of number of columns that can be stored, and I would like to let this data set grow forever.
Not sure how to proceed.
If you want a composite partition key, you need extra brackets:
CREATE TABLE minute_usr (
min varchar,
usr varchar,
bytes counter,
PRIMARY KEY ((min, usr))
);
However, then you can't do range queries on min or usr. You can only do range queries on non-partition keys.
You can use your second model, with usr as the partition key, and shard on some time bucket to stop the rows growing too large. For example, you could have a partition per user per day:
CREATE TABLE usr_day_minute (
day varchar,
min varchar,
usr varchar,
bytes counter,
PRIMARY KEY ((usr, day), min)
);
but now you will need to do separate queries per day if you want results for multiple days. You could choose a larger time bucket, but at the cost of larger rows.
Note you probably want to use a numeric data type or timestamp for min so you do numeric comparisons rather than string comparison.