Search code examples
cassandraexpress-cassandra

Cassandra SASI index or materialized view - performance improvement


I'm currently optimizing my application, and I want to find out which way would be faster and better performance-wise in the following scenario.

I have a table with 4 fields - id, user, status, entryTime.

I'm doing write to this endpoint around 100 times every 10 seconds, so in average 10 writes per second.

The primary key is user, and the clustering key is entryTime and id.

I have an endpoint, where I need to retrieve all entries between a specific entryTime for a specific user, so, for example, for user with ID 1, where entryTime is greater than 2019-06-04T07:58:28.000Z and less than 2019-06-04T08:58:28.000Z.

Another endpoint is, where I have to retrieve with specific status for a specific user.

Is it better to create a materialize view for the 2nd endpoint (where I need to retrieve the status), with different keys, or to add SASI index?

Since the table is updated frequently as well, and written frequently, from what I've read, the writes take around 10% performance, but does it apply to all tables, which has frequent read/writes?

Are there any check points for future reference which I can follow to determine if I should go for materialized view or SASI index?


Solution

  • I don't have any experience with SASI indexes, however, I can tell you that with a STATUS column, and I'm assuming the status will change for the rows, if you create a MVIEW with that as the partition key (so you can filter by it), every time the status changes on the main table, the MVIEW will perform a DELETE followed by an INSERT (also using lookups). With your load (10 writes/second - not sure how many of them are UPDATES), that could be problematic on the MVIEW. We use MVIEWS, but the load is light. Queries work very well on them. Not sure this helps

    @JimWartnick that totally does. Thanks for clarifying that! Should materialized views be used on tables which is not updated frequently?

    I think it helps, but not a requirement. Just remember Cassandra first makes changes on the base table, then propagates it to the MVIEW. So the same replication concerns apply. Also, it is possible for the MVIEW to miss changes, causing it to be out of sync. The only way to fix it is to rebuild the MVIEW. One thing nice about the MVIEW is that it DOES allow you to have an updatable/changable column as part of the partition key, which is not allowed on a standalone table. The downside is that it does perform an extra operation (again, delete followed by insert). That could cause extra load