Search code examples
pythonazuredatabricksazure-databricksdbx

dbx execute install from azure artifacts / private pypi


I would like to use dbx execute to run a task/job on an azure databricks cluster. However, i cannot make it install my code.

More Details on the situation:

  • Project A with a setup.py is dependent on Project B
  • Project B is also python based and is realeased as a azure devops artifact
  • I can successfully install A by using an init script on an azure databricks cluster by git clone both projects in the init script and then pip install -e project B and A.
  • It also works when i create a pip.conf file in the init script which configures a token to use my artifacts feed
  • So dbx deploy/launch works fine as my clusters use the init script
  • However dbx execute always fails telling me that it cannot find and install Project B

Does anyone know how to configure the pip which is used during dbx execute installation process? Somehow this seems to be ignoring any conf which was set with init scripts.

I searched through lots of documentation such as https://docs.databricks.com/libraries/index.html and https://dbx.readthedocs.io/en/latest/reference/deployment/#advanced-package-dependency-management but with no luck

When i look into dbx package seems not that there is an option to set any pip.conf :( https://github.com/databrickslabs/dbx/blob/main/dbx/commands/execute.py


Solution

  • I raised an issue also in the github repo of dbx. https://github.com/databrickslabs/dbx/issues/669 They pointed me to this link

    https://dbx.readthedocs.io/en/latest/guides/general/dependency_management/?h=custom+rep#installing-python-packages-from-custom-pypi-repos

    which explains how to do it.

    In short. Overwrite the global pip.conf in /etc/pip.conf in your init.sh

    #!/bin/bash
    
    echo """[global]
    index-url=https://pypi.org/simple
    extra-index-url=https://my.custom.pypi.example.com/simple/
    """ > /etc/pip.conf
    

    To make it work with azure devops. I created an azure devops personal access token and adapted extra-index-url looked like this:

    https://<anyname>:<token_with_read_package_permissions>@pkgs.dev.azure.com/<organisation>/<project>/_packaging/<feedname>/pypi/simple/
    

    replace all values in <....> with your values. can have any value as the token is enough for authentication