Search code examples
databricksinit

How to install python package from private git repo using init bash script on databricks?


I'm trying to pip install a python package from a private github repo using a init.sh script that I uploaded to my s3 bucket. enter image description here

This is my init.sh file

#!/bin/bash
TOKEN={{secrets/private-repo/github}}
pip install git+https://${TOKEN}@github.com/<path-to-repo>

When I try to create my cluster I get the following error messsage. Init script failure: Cluster scoped init script s3://<s3_bucket>/init.sh failed: Script exit status is non-zero

I create a secret through the API with scope and key as private-repo and github. I tested this using a notebook and it worked fine.

Documentation Used: https://docs.databricks.com/security/secrets/secrets.html#reference-a-secret-in-an-environment-variable


Solution

  • The problem is that you're trying to refer to the secret using the {{secrets/private-repo/github}} syntax, but it doesn't work from the inside of the init script.

    You need to define an environment variable on the cluster level and use that secret syntax there, and then it will be available inside your init script. See documentation on that topic.

    Add this line from your init script to the Cluster > Advanced options > Spark > Environment variables section.

    TOKEN={{secrets/private-repo/github}}