Search code examples
apache-sparkcassandradatastax

Is there way in cassandra system tables check the counts ? where we can check the meta data of latest inserts?


i am working on migration tool oracle to cassandra , where I want to maintain a validation table with columns oracle count and cassandra count , so that i can validate the migration job,in cassandra is there any way system maintains the recently executed/inserted query count ? total count of a particular table ? is there anywhere in cassandra system tables does it store? if so what is it ? if not please suggest some way to design validation framework of data migration.

Is there way in cassandra, get the latest query inserted record count and total count of table in any system tables from where we can read the counts instead of executing the count(*) query on the tables ? does cassandra maintains the of the counts anywhere internally ?If so where we can check the meta data of latest inserts i.e which system tables?


Solution

  • Cassandra is distributed system and there is no place where it will collect the counts per tables. You can get some estimates from system.size_estimates, but it will say only paritions count per range, and their sizes.

    For such framework as you're asking, you may need to develop custom Spark code (easiest way) that will perform counting of the rows, and other checks. Spark is highly optimized for effective data access and could be more preferable than writing the custom code.

    Also, during migration, consider using consistency level greater than ONE to make sure that at least several nodes confirmed writing of the data. Although, it depends on the amount of data & timing requirements for your migration jobs.