Search code examples
sortingcassandracountercqlnosql

Using Cassandra to count a big list of data


We are using Cassandra to count various analytics metrics, broken down by account and date, which seems to be working well:

SELECT COUNT(page_impressions) FROM analytics WHERE account='abc' and MINUTE > '2015-01-01 00:00:00';

We would like to further break down this data by domain, which causes a problem. The number of possible domains would run into the millions for some accounts over the span of a month or so, and we are most interested in the 'top' domains, meaning that we would like to sort by the page_impressions field.

Does anybody have pointers for me on how to count by domain and sort by total page impressions?

Thanks!


Solution

  • Cassandra supports counters which could be useful to create a top domain list in a separate table.

    You might also be interested to use an analytics engine such as presto or spark with cassandra, because its generally not very practical to adopt your data model for different analytics use-cases.