Search code examples
google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-dataprep

Add more workers to dataflow job on GCP


Im creating a dataprep flow that imports a CSV to BQ. This works fine but it takes too long time. Even for very small files. Is there a way to add more workers on the job? maxNumWorkers is always 1 by default.

Br Cris


Solution

  • The first time that a Dataflow job was executed by Dataprep, the settings will be the default ones. However, you could re-run these jobs with different parameters directly from Dataflow by using its templates. For instance, you could use the REST API and using the numWorkers field to specify the workers to execute the job, as it is unspecified, the service will attempt to choose a reasonable default. For more information regarding the REST API, you could review this document.

    Keep in mind that it has limitations