Search code examples
cassandraanalyticscassandra-3.0

Cassandra for predective analysis


We periodically collect the system statistics and dump into Cassandra as blob (Json) in one column for every one minute. This table has only one partition and the entries will not cross 100K

This table seems fine for dumping the data and reading the data based on time stamp. So far we are good.

We are planning to perform the predictive analysis for the system statistics, example for every minute we compare the current statistics with the history of the system statistics with our own logic (to be frank we have not completed the logic)

So if we use the query

Select statisticsjson, timestamp from stattable where partitionid = 'stat' and timestamp > X

Returns all the Json we need.

Now how to analyse the history of the Json data and warn the user that the current state of the system is in a dangerous state, which is the best tool for doing an analytics of this old Json data ?


Solution

  • A common way to analyse data stored in Cassandra is to use apache Spark and the spark-cassandra connector. This typically means collocating a Cassandra service and a Spark-worker on each of your Cassandra nodes. This will allow you to run any type of analytic you wouldn't be able to do in Cassandra (no join, limited aggregation etc...). With spark, you would be able to read the json object, and perform any transformation you need, all in parallel.

    Depending your business requirements, you might get away by writing a simple app that retrieve the data from Cassandra (granted it's limited in size), and perform the analytic against this limited data set.