In the Dataflow FAQ, it is listed that running custom (cron) job processes on Compute Engine is a way to schedule dataflow pipelines. I am confused about how exactly that should be done: how to start the dataflow job on compute engine and start a cron job.
Thank you!
You can use the Google Cloud Scheduler to execute your Dataflow Job. On Cloud Scheduler you have targets, these could be HTTP/S endpoints, Pub/Sub topics, App Engine applications, you can use your Dataflow template as target. Review this external article to see an example: Schedule Your Dataflow Batch Jobs With Cloud Scheduler or if you want to add more services to the interacion: Scheduling Dataflow Pipeline using Cloud Run, PubSub and Cloud Scheduler.