Search code examples
amazon-web-servicesgithubpipamazon-emr

How to *safely* install a python private package from github in an AWS EMR bootstrap script


I want to install a Python3 package from a private Github repository onto an AWS EMR Spark cluster.

I know how to do this the dirty way by hardcoding credentials but what is the recommended best practice to do this safely ? I don't want to store credentials in a bootstrap script...

Thanks in advance.


Solution

  • Thanks to Maurice I've successfully implemented a safe process, following his option #2.

    1. Create an access token with read credentials on github.

    2. Store this in AWS Secrets Manager. In my case I named this secret "github-read-access"

    3. Give access to this secret to the user that is going to query it, or in the case of a bootstrap EMR script, to the EMR roles.

    4. Using aws CLI I store the token as an environment variable and install the package with the following commands:

      export GITHUB_TOKEN=`aws secretsmanager get-secret-value --secret-id github-read-access |grep SecretString|cut -d ":" -f 3|cut -d '"' -f 2 |cut -d '\' -f1`
      sudo pip3 install git+https://${GITHUB_TOKEN}@github.com/<USER_NAME>/<REPO_NAME>.git