python amazon-web-services jupyter-notebook amazon-emr papermill

Can I use Papermill and Scrapbook with AWS EMR Notebooks?

I have several notebooks which are ran by a "driver" notebook using papermill. These notebooks use the scrapbook library to communicate information to the driver. The driver then passes this information as parameters to other notebooks. I want to use EMR Notebooks to optimize the execution efficiency of this "notebook pipeline". Does AWS EMR Notebooks support scrapbook and papermill or will I need to refactor my notebooks?

Solution

As of now, nope. You can't do that directly. What you can do though (what we are doing) is as follows :

Create a python environment on your EMR masternode using the hadoop user
Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
Copy your notebook to master node/use it directly from s3 location
Install papermill and run with papermill :

papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1