Search code examples
hadoopcassandrahector

Hector's batch Mutation vs. using Hadoop jobs to load data into Cassandra?


Can someone highlight the pros and cons for Hector's batch Mutation and using Hadoop jobs to load data into Cassandra?

I know in Hector you can do something like the following:

mutator.addInsertion(...);
mutator.execute();

And in Hadoop you can use MR jobs to load data into Cassandra.

I'm looking for the reasons to use or not to use each of them. Thanks!


Solution

  • If the datasource is not currently in hadoop (or hbase) I would recommend just a multi-threaded loader using Mutator as above to keep down the number of moving parts.

    This gist is dated, but the approach would be similar: https://gist.github.com/397574

    Let me know if you want more details.