I have a Cassandra cluster of 5 nodes with writeConsistency: LOCAL_QUORUM
. The cluster has TBs of Data. Now I need to migrate the data to a different keyspace
in the same cluster. Following is the requirement.
keyspace_1 --> read data --> transform --> insert in keyspace_2.
Now we can do that by some multi-instance microservices, where we can read the data from keyspace_1
, and then transform it and then insert it in the target keyspace.
But is there a better approach to it. I have found an article How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M. But here mainly the SSTable is getting transferred without any transformation. But I need a data transformation mechanism in between. Can anyone suggest me a good approach here/ someone did this type of activity earlier?
Simplest way will be to use Spark to load the data, perform transformations, and save data into new table(s) - as Spark is able to perform automatic parallelization of data processing, it will be easier than to use Spring Boot. Depending on your requirements you can use either Spark SQL API or RDD API of the Spark Cassandra Connector.
val df = sqlContext.read
.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "ks", "table" -> "tab))
.load
val dfTransformed = df.select.... // do transformation
dfTransformed.write
.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "ks2", "table" -> "tab))
.save
If you won't use Spark, then you'll need to perform full scan of all data, transform read data & write them - but this is more complicated task as you will need to handle failures, etc. Also, effective reading of the data from Cassandra not an easy task - you can look to this code example, but I suggest to look to Spark first.