Lately I have updated my dataflow apache beam pipeline to the latest version, my pipeline writes a huge amount of data. The pipeline before apache beam version update from 2.27 to 2.41 takes about 8 min to finish executing while after the update it takes more than 30 min to finish executing.
Before the Update
After the update
The Enforce ramp-up through throttling
step wasn't shown before updating the pipeline version.
Update: As mentioned in the Apache Beam changes on updates in version 2.32.0 that:
DatastoreIO: Write and delete operations now follow automatic gradual ramp-up, in line with best practices (Java/Python)
Where I think that the latency in writing occurs because of this update!!
I checked with the team and generally speaking that's the expected behavior. The settings for the IO have those as standard settings to follow best practices for ramp-up, and not using it is possible, but discouraged.
DatastoreV1 docs can provide for further guidance:
Write and delete operations will follow a gradual ramp-up by default in order to protect Cloud Datastore from potential overload. This rate limit follows a heuristic based on the expected number of workers. To optimize throughput in this initial stage, you can provide a hint to the relevant PTransform by calling withHintNumWorkers, e.g., DatastoreIO.v1().deleteKey().withHintNumWorkers(numWorkers). While not recommended, you can also turn this off via .withRampupThrottlingDisabled().