Search code examples
pythonamazon-web-servicesjupyter-notebookamazon-emrpapermill

Can I use Papermill and Scrapbook with AWS EMR Notebooks?


I have several notebooks which are ran by a "driver" notebook using papermill. These notebooks use the scrapbook library to communicate information to the driver. The driver then passes this information as parameters to other notebooks. I want to use EMR Notebooks to optimize the execution efficiency of this "notebook pipeline". Does AWS EMR Notebooks support scrapbook and papermill or will I need to refactor my notebooks?


Solution

  • As of now, nope. You can't do that directly. What you can do though (what we are doing) is as follows :

    1. Create a python environment on your EMR masternode using the hadoop user
    2. Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
    3. Copy your notebook to master node/use it directly from s3 location
    4. Install papermill and run with papermill :

      papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1