cassandra migration database-migration scylla

How to migrate a cassandra counter table to another cluster?

We have a 21-node cassandra cluster, with a cassandra counter table with almost 2 billion rows. I tried to migrate this table once. First I did dual-write in both clusters, with such a code (in golang):

counterDiff := incrementValue
_, err := newRepo.FindById(ctx, id)
if err != nil {
    if err == ErrRecordNotFound {
        record, err := oldRepo.FindById(ctx, id)
        if err != nil {
            // log
            return
        }
        counterDiff = record.Count
    } else {
        // log
        return
    }
}
newRepo.Update(ctx, id, counterDiff, false)

Indeed, I initialized new counters with the value from old cluster.

Then I migrated data with CQL queries and wrote all rows one by one in new cluster, if the row/key not existed already.

But unfortunately in the validation step, I saw some differences between two clusters, and a lot of differences (not all of them) was in the form of: newClusterValue == n * oldClusterValue

Now I have 4 questions:

What is the problem of my migration strategy? I think I should use mutex locks in my dual write function to prevent race conditions. Any suggestion? Any other problem?
How the scylla or cassandra sstableloader tool deal with counter columns? Can I use them for migration anyway?
What is the best way for migrating counter tables at all?
Duo to not being idempotent in update, is cassandra counter tables good for accurate counting? Is there a better solution in case of large data?

Solution

You asked several questions, I'll try to answer at some of them, hopefully other people will come with answers to other questions:

1: Indeed, your "dual write"'s copy step has a problem with concurrent updates: if you have n concurrent updates, all of them will increment the new counter by the amount of the old counter, so you end up incrementing the new counter by n * oldcounter, as you noticed.

4: Another option besides a counter is LWT with "optimistic locking" (get the current count, set it to count+1 IF the current count is still equal to count, repeat otherwise). But this too is not idempotent in the sense that if a transaction failed in an unclean way (e.g., network problem, reboot, etc.) you don't know whether you should repeat it as well. Something you could perhaps do (I never tried this myself, perhaps someone else did?) is to have in your LWT batch two statements for the same partition - one updating the actual counter in a static column, and the other one setting "unique id" clustering row on a client-generated unique id, if not yet set. If the LWT update failed because the second part failed, it means the update already succeeded in the past, and should no longer be retried. The unique id rows can be created with a short TTL (e.g., 1 hour) if it's enough for you that the idempotency only spans 1 hour (i.e., you don't expect a retry of the same query 2 hours later).