Search code examples
google-bigquerygoogle-cloud-dataprep

Google DataPrep is extremely slow


In Google Dataflow, i have a job that basically looks like this:

Dataset: 100 rows, 1 column.
Recipe: 0 steps
Output: New Table.

But it takes between 6-8 minutes to run. What could be the issue?


Solution

  • Usually times are in minutes, not in seconds for Dataprep/dataflow setup. These solutions are for large data sets and the duration stays constant even if you have 10 times the size.

    DataPrep creates for you a DataFlow workflow, and provisions a few VMs for you, that takes time, usually that phase could be in the minute mark. And only a bit later is scaling that up to 50 or 1000 boxes.