Say there is a system outage; record of 1 page visit for every 100 did not make it to your Spanner instance for some reason. Probably QA's fault. Anyway, you have a different store of data from which you can upload to Spanner. Great! Assuming you have a significant number of records, you'll batch them together, maybe sending up 1000 at a time. But 990 of those 1000 are likely to already exist in spanner, so your insert operation will fail. You could upload one-by-one, but you take a huge perf hit for that. You could even try some adaptive batch size, but that winds up getting a bit more complicated.
Is there an easy way to solve this? I want dbClient.writeAtLeastOnceIgnoreErrors(Iterable<Mutation>)
, but I don't think anything of that sort is available. Am I wrong?
If I understand your question correctly, you want to be able to send 1,000 mutations (inserts) to Spanner, knowing that many of those 1,000 records will already exist in Spanner. The easiest way to do that is to use an InsertOrUpdate mutation like this:
Mutation m = Mutation.newInsertOrUpdateBuilder("YOUR_TABLE").set("COL1").to("some_val").build();
dbClient.writeAtLeastOnce(m);
InsertOrUpdate will do exactly what you would expect, i.e. only insert records that do not already exist, and update the records that do already exist without returning any errors. You must supply a value for all NOT NULL columns in your table.