Search code examples
amazon-web-servicesjupyter-notebookaws-cloudformationaws-glue

AWS: cloudformation to create Glue jupyter notebook and dev endpoint


Looking through the cloud formation documentation I can't see a way to spin up a Glue DevEndpoint, a Jupyter notebook and have the notebook use the newly created DevEndpoint.

Can someone help?


Solution

  • I've found the solution and the key is using the cloud formation object AWS::SageMaker::NotebookInstanceLifecycleConfig to hook into the notebook OnStart and OnCreate notebook events.

    Notebooks created in the Glue portion of the console can be found in SageMaker and there you can see the LifecycleConfig resource associated and it's code.

    For completeness on this question please see below the code that is used at the moment for both OnStart and OnCreate when you create a Jupyter notebook from Glue.

    Please note that by using this method the newly created notebook has the exact functionality of the notebook created through the console but it will only be visible in the SageMaker portion of the console.

    #!/bin/bash
    set -ex
    [ -e /home/ec2-user/glue_ready ] && exit 0
    
    mkdir -p /home/ec2-user/glue
    cd /home/ec2-user/glue
    
    # Write dev endpoint in a file which will be used by daemon scripts
    glue_endpoint_file="/home/ec2-user/glue/glue_endpoint.txt"
    
    if [ -f $glue_endpoint_file ] ; then
        rm $glue_endpoint_file
    fi
    echo "https://glue.eu-west-2.amazonaws.com" >> $glue_endpoint_file
    
    ASSETS=s3://aws-glue-jes-prod-eu-west-2-assets/sagemaker/assets/
    
    aws s3 cp ${ASSETS} . --recursive
    
    bash "/home/ec2-user/glue/Miniconda2-4.5.12-Linux-x86_64.sh" -b -u -p "/home/ec2-user/glue/miniconda"
    
    source "/home/ec2-user/glue/miniconda/bin/activate"
    
    tar -xf autossh-1.4e.tgz
    cd autossh-1.4e
    ./configure
    make
    sudo make install
    sudo cp /home/ec2-user/glue/autossh.conf /etc/init/
    
    mkdir -p /home/ec2-user/.sparkmagic
    cp /home/ec2-user/glue/config.json /home/ec2-user/.sparkmagic/config.json
    
    mkdir -p /home/ec2-user/SageMaker/Glue\ Examples
    mv /home/ec2-user/glue/notebook-samples/* /home/ec2-user/SageMaker/Glue\ Examples/
    
    # ensure SageMaker notebook has permission for the dev endpoint
    aws glue get-dev-endpoint --endpoint-name somiron-dfe-poc-GlueDevEndpoint --endpoint https://glue.eu-west-2.amazonaws.com
    
    # Run daemons as cron jobs and use flock make sure that daemons are started only iff stopped
    (crontab -l; echo "* * * * * /usr/bin/flock -n /tmp/lifecycle-config-v2-dev-endpoint-daemon.lock /usr/bin/sudo /bin/sh /home/ec2-user/glue/lifecycle-config-v2-dev-endpoint-daemon.sh") | crontab -
    
    (crontab -l; echo "* * * * * /usr/bin/flock -n /tmp/lifecycle-config-reconnect-dev-endpoint-daemon.lock /usr/bin/sudo /bin/sh /home/ec2-user/glue/lifecycle-config-reconnect-dev-endpoint-daemon.sh") | crontab -
    
    source "/home/ec2-user/glue/miniconda/bin/deactivate"
    
    rm -rf "/home/ec2-user/glue/Miniconda2-4.5.12-Linux-x86_64.sh"
    
    sudo touch /home/ec2-user/glue_ready