I want to scale on cloud a one off pipeline I have locally.
I can only test it locally on a small fraction of this dataset. The whole pipeline would take a couple days to run on a MacbookPro.
I've been trying to replicate this on GCP - which I am still discovering.
What is the best way to run such a python data processing pipeline with a container on GCP ?
Thanks to the useful comments in the original post, I explored other alternatives on GCP.
Using a VM on Compute Engine worked perfectly. The overhead and setup is much less than I expected ; the setup went smoothly.