In Google Dataflow, i have a job that basically looks like this:
Dataset: 100 rows, 1 column.
Recipe: 0 steps
Output: New Table.
But it takes between 6-8 minutes to run. What could be the issue?
Usually times are in minutes, not in seconds for Dataprep/dataflow setup. These solutions are for large data sets and the duration stays constant even if you have 10 times the size.
DataPrep creates for you a DataFlow workflow, and provisions a few VMs for you, that takes time, usually that phase could be in the minute mark. And only a bit later is scaling that up to 50 or 1000 boxes.