Search code examples
bashgoogle-cloud-platformgcloud

gcloud Dataflow Drain Command Wait Until Job Finished Draining


I'm currently building cd pipeline that replace existing Google Cloud Dataflow streaming pipeline with the new one with bash command. The old and new has the same name job. And I write bash command like this

gcloud dataflow jobs drain "${JOB_ID}" --region asia-southeast2 && \
gcloud dataflow jobs run NAME --other-flags

The problem with this command is that the first command doesn't wait until the job finish draining so that the second command throw error because duplicated job name.

Is there a way to wait until dataflow job finish draining? Or is there any better way? Thanks!


Solution

  • Seeing as this post hasn't garnered any attention, I will be posting my comment as a post:

    Dataflow jobs are asynchronous to the command gcloud dataflow jobs run, so when you use && the only thing that you'll be waiting on will be for the command to finish and since that command is just to get the process started (be it draining a job or running one) it finishes earlier than the job/drain does.

    There are a couple of ways you could wait for the job/drain to finish, both having some added cost:

    1. You could use a Pub/Sub step as part of a larger Dataflow job (think of it as a parent to the jobs you are draining and running, with the jobs you are draining or running sending a message to Pub/Sub about their status once it changes) - you may find the cost of Pub/Sub [here].
    2. You could set up some kind of loop to repeatedly check the status of the job you're draining/running, likely inside of a bash script, though that can be a bit more tedious and isn't as neat as a listener, and it would require one's own computer/connection to be maintained or a GCE instance.