google-compute-engine google-cloud-dataflow apache-beam google-cloud-spanner

Spanner load performance very low

I'm trying to write TBs of data in Spanner using Dataflow.

The spanner instance is configured with enough nodes and Dataflow is running with n1-standard-16 machines.

The job is running very slow. The Spanner CPU utilization has been well within limits throughout and write latency is also very less in milliseconds. Basically everything seems to be in control. No other read/write operation is being done on this instance simultaneously.

The load might have approx. one million records that might have the same key as some of the records in Spanner. I'm using the InsertBuilder to avoid writing such rows in Spanner. Can this be a major cause of the low performance? I'm also using writeFailureMode() as "Report Failures" (not "Fail Fast"). So according to me "ALREADY EXISTS" shouldn't hamper the performance so much but not really sure.

Solution

The error is coming from Beam SpannerIO.java. SpannerIO tries to batch several mutation groups together to write more efficiently to Cloud Spanner. However if a mutation group has an already existing key, the batch cannot be written in its entirety. Instead each mutation group is tried individually to successfully write each one that does not have a duplicate key. Since in this case the batch size effectively becomes one, the insert performance will be lower.