We have a 21-node cassandra cluster, with a cassandra counter table with almost 2 billion rows. I tried to migrate this table once. First I did dual-write in both clusters, with such a code (in golang):
counterDiff := incrementValue
_, err := newRepo.FindById(ctx, id)
if err != nil {
if err == ErrRecordNotFound {
record, err := oldRepo.FindById(ctx, id)
if err != nil {
// log
return
}
counterDiff = record.Count
} else {
// log
return
}
}
newRepo.Update(ctx, id, counterDiff, false)
Indeed, I initialized new counters with the value from old cluster.
Then I migrated data with CQL queries and wrote all rows one by one in new cluster, if the row/key not existed already.
But unfortunately in the validation step, I saw some differences between two clusters, and a lot of differences (not all of them) was in the form of: newClusterValue == n * oldClusterValue
Now I have 4 questions:
sstableloader
tool deal with counter columns? Can I use them for migration anyway?You asked several questions, I'll try to answer at some of them, hopefully other people will come with answers to other questions:
1: Indeed, your "dual write"'s copy step has a problem with concurrent updates: if you have n concurrent updates, all of them will increment the new counter by the amount of the old counter, so you end up incrementing the new counter by n * oldcounter, as you noticed.
4: Another option besides a counter is LWT with "optimistic locking" (get the current count, set it to count+1 IF the current count is still equal to count, repeat otherwise). But this too is not idempotent in the sense that if a transaction failed in an unclean way (e.g., network problem, reboot, etc.) you don't know whether you should repeat it as well. Something you could perhaps do (I never tried this myself, perhaps someone else did?) is to have in your LWT batch two statements for the same partition - one updating the actual counter in a static column, and the other one setting "unique id" clustering row on a client-generated unique id, if not yet set. If the LWT update failed because the second part failed, it means the update already succeeded in the past, and should no longer be retried. The unique id rows can be created with a short TTL (e.g., 1 hour) if it's enough for you that the idempotency only spans 1 hour (i.e., you don't expect a retry of the same query 2 hours later).