Search code examples
node.jsgoogle-cloud-platformgoogle-bigquerygoogle-cloud-run

long-running job on GCP cloud run


I am reading 10 million records from BigQuery and doing some transformation and creating the .csv file, the same .csv stream data I am uploading to SFTP server using Node.JS.

This job taking approximately 5 to 6 hrs to complete the request locally.

Solution has been delpoyed on GCP Cloud run but after 2 to 3 second cloud run is closing the container with 503 error.

Please find below configuration of GCP Cloud Run.

Autoscaling: Up to 1 container instances CPU allocated: default Memory allocated: 2Gi Concurrency: 10 Request timeout: 900 seconds

Is GCP Cloud Run is good option for long running background process?


Solution

  • You can try using an Apache Beam pipeline deployed via Cloud Dataflow. Using Python, you can perform the task with the following steps:

    Stage 1. Read the data from BigQuery table.

    beam.io.Read(beam.io.BigQuerySource(query=your_query,use_standard_sql=True))
    

    Stage 2. Upload Stage 1 result into a CSV file on a GCS bucket.

    beam.io.WriteToText(file_path_prefix="", \
                        file_name_suffix='.csv', \
                        header='list of csv file headers')
    

    Stage 3. Call a ParDo function which will then take CSV file created in Stage 2 and upload it to the SFTP server. You can refer this link.