I want to install a Python3 package from a private Github repository onto an AWS EMR Spark cluster.
I know how to do this the dirty way by hardcoding credentials but what is the recommended best practice to do this safely ? I don't want to store credentials in a bootstrap script...
Thanks in advance.
Thanks to Maurice I've successfully implemented a safe process, following his option #2.
Create an access token with read credentials on github.
Store this in AWS Secrets Manager. In my case I named this secret "github-read-access"
Give access to this secret to the user that is going to query it, or in the case of a bootstrap EMR script, to the EMR roles.
Using aws CLI I store the token as an environment variable and install the package with the following commands:
export GITHUB_TOKEN=`aws secretsmanager get-secret-value --secret-id github-read-access |grep SecretString|cut -d ":" -f 3|cut -d '"' -f 2 |cut -d '\' -f1`
sudo pip3 install git+https://${GITHUB_TOKEN}@github.com/<USER_NAME>/<REPO_NAME>.git