I'm copying Spanner data to BigQuery through a Dataflow job. The job is scheduled to run every 15 minutes. The problem is, if the data is read from a Spanner table which is also being written at the same time, some of the records get missed while copying to BigQuery.
I'm using readOnlyTransaction() while reading Spanner data. Is there any other precaution that I must take while doing this activity?
It is recommended to use Cloud Spanner commit timestamps to populate columns like update_date
. Commit timestamps allow applications to determine the exact ordering of mutations.
Using commit timestamps for update_date
and specifying an exact timestamp read, the Dataflow job will be able to find all existing records written/committed since the previous run.