Search code examples
pythongoogle-cloud-platformkubeflowparallel.for

How do I collect all outputs from a kubeflow parallel for loop?


I'm running a time series model in kubeflow using python's SDK package - kfp.v2. I need to run this model pipeline, then "walk" one month forward and run it again and so on for 24 months to get multiple snapshots in time.

That said, I don't know how to collect all of the final output into one place so I can concatenate and batch load it into a Big Query table.

I tried loading into the table direct from each loop, but that results in an Error 403: Exceeded rate limits. Exeeded the number of uploads to this table.

Is there a way to collect all of the results so I can concatenate and load once?


Solution

  • Usually this error occurs when you hit this quota, 5 ops / 10s. Which can be resolved by limiting your rate.

    As given in the document:

    Your project can make up to 1,500 table modifications per table per day, whether the modification appends data, updates data, or truncates the table. This limit includes the combined total of all load jobs, copy jobs, and query jobs that append to or overwrite a destination table or that use a DML DELETE, INSERT, MERGE, TRUNCATE TABLE, or UPDATE statement to write data to a table.

    Refer to this SO link for more information.