Search code examples
pythongoogle-cloud-platformgoogle-cloud-dataprocdataproc

Can we create Dataproc Workflow Template by passing a path of Jupyter notebooks in step_id?


I have been trying to create Dataproc Workflow Template to execute Jupyter notebooks present on my Dataproc cluster but when I instantiate that template the jobs fail whereas if I download my notebooks as .py files and then add them to a Workflow Template it works.

I am just curious if there is any way to create a Workflow Template that can directly take in existing Jupyter notebooks as its steps.


Solution

  • Direct execution of Jupyter notebooks via Jobs and Workflow Template APIs is not yet supported on Dataproc.

    You can workaround this by writing and submitting a PySpark job/Workflow Template step that will use nbconvert to execute a notebook.