Basic partitioning in PlayORM

Assume that I have a CF in cassandra that has following schema:

TimeStamp
Device ID
Device Name
Device Owner

PKEY (TimeStamp, Device ID): This means partition is happening on TimeStamp.

Following are the queries I'm interested in:

Select * from schema where TimeStamp='..' Select * from schema where DeviceID='..'

First query returns 500K records, second query returns 50K records. For first query, the bottleneck is all fetch on a single node, so I want to distribute data on multiple nodes for a TimeStamp. The bottleneck for second query is that all the records may be distributed all over the disks on different nodes, resulting in several disk fetches.

Now assume that I want to create virtual partitions, such that entries for a particular TimeStamp are also distributed across cluster nodes. Is this possible in PlayORM? If yes, can you please provide code which can do so (or example which does such a thing)?

Another requirement that I have is to search all the records for a particular Device ID. Can I do virtual partitioning on 'Device ID' for the same CF? If yes, can you please provide code/link which tells how to do it?

I'd be glad if some one could provide the source code for doing some thing like this, because documentation are not so easy to comprehend, and writing code simply by reading the current documentation is turning out to be a nightmare. Without 'complete' code examples, evaluating PlayORM seems to be impossible.

Solution

Yes, you would need something like this in PlayOrm...(post a comment if I am missing anything and I can respond again).

https://github.com/deanhiller/playorm/blob/master/src/test/java/com/alvazan/test/db/PartitionedTrade.java

and also the query PARTITIONS t('account', :partId) SELECT t FROM TABLE as t INNER JOIN t.security as s WHERE s.securityType = :type and t.numShares = :shares"

'account' identifies the partition column and :partId is the id of the partition. In your case, you would have PARTITIONS t('deviceid', {actualDeviceId}) or t('time', {time}) where the first param is the column name and the second is the actual id of the partition for time or the id of the partition for device. Realize partitions should not have more than X millions of rows where X is probably around 3 million.

The package com.alvazan.test.db have tons of different examples and the com.alvazan.test shows how they are used. I am going to ask someone to tweak the docs based on your feedback to put links directly to the code in our codebase......

ps. if you download from github, run gradlew eclipse OR gradle eclipse(depending on OS), and then import into eclipse, all the tests work out of the box with an in-memory noSQL version(we use that for development). Then if you want to run against cassandra, in the docs it has how to change one line and all tests run against cassandra.

Speedup. PlayOrm does a wide-row using the composite-name pattern for each partition(index per partition). When you query, it reads this row in batches of 200(or the size you provide), and then sends out requests using the keys found in the index to all machines(ie. you get parallel throughput at this point). This is because every partition is spread across the cluster. In fact, all nodes end up with slices of nearly all partitions well depending on how many nodes you have and how many partitions(ie. 100 nodes and 32 partitions, not all nodes will have all partitions).

Under the covers playorm is doing something really really simple. All rows are written as if they were not partitioned at all!!! Then the index row is written(RF=3 means to 3 nodes) and the index row name is /TABLE/partition/column/partitionId. That is the row key for the index. With the command line tool, you can even read the index yourself and just the index or query the partition. USe playOrm's command line tool for this.

Lastly, since wide rows in cassandra are ordered, when you use a specific index like PARTITIONS d('deviceid', 'device1') select d from TABLE as d where d.time > Integer.MIN_INT

then the results come back in order of that index(ie. time in this case), or if you want reverse order, just call cursor.afterLast then cursor.previous, cursor.previous etc. etc.

To be clear, PlayOrm ignores cassandra partitioning on purpose here. It writes in your data just like there is no partition at all. It also writes in an index or two. Let's say you partition twice, once by time and once by device id. In this case it writes to the StringIndice or IntegerIndice table(BigInteger!!! not Integer) with row keys (and say your entity is called Devices). Let's also say in your entity, you have @NoSqlIndexed on the "name" column!!!!

/Devices/byDevice/device1/name = the wide row
/Devices/byTime/time56/name = the wide row

If you have more @NoSqlIndexed columns, there is more rows in the index tables. All rows however are spread across the cluster and do NOT care about partitioning.

Does this make sense? Feel free to give it a shot and try it out. Just post a new question on stackoverflow if you have any issues/questions on implementing it.