Search code examples
cassandrapentahosqooptalenddatastax

Cassandra and SQL Server replication


Im looking for a way to replicate tables, possibly entire database from Microsoft SQL Server to Cassandra (Datastax). I don't need real time but can be around 30 second latency . So far research hasn't given me many good options. I was looking at using Talend/Pentaho to schedule the jobs, possibly sqoop as well but then I still need an ETL tool to do some transformations before loading into Cassandra.

So I would like to pull data from SQL Server, perform some spark transformations on the data, then load into Cassandra.

So far only real time replication I have seen involved flume plugin but to hdfs. cassandra


Solution

  • If you want keep things simple, you can do the full job with DSE. You can schedule sqoop jobs with crontab mirroring data in cassandra, sqoop support incremental imports. Then you can schedule some spark job to perform etl and persist modified data in the final cassandra table. If your data is big, you should perform ETL at scale with spark, don't use pentaho for that. IMHO