I have some helper utilities defined in a separate python script. I would like to make the script available to the DSX notebook, so I can call them in the cell, but I don't want to put the script into the cell directly.
What are some of the ways to achieve this?
If you are ok with making your code publicly available on a public git repository, you could turn your code into a python package and save it in github. See here for an example package: A simple Hello World setuptools package and installing it with pip.
You can install it directly from github using:
!pip install --user git+https://github.com/public_account/public_repo
It should also be possible to use a similar approach as above with a private github repository, with a few extra setup steps and a different url format for pip. E.g.
Generate a ssh key on dsx
! ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N ""
Add the output from the following command to your github account settings :: SSH and GPG keys
! cat ~/.ssh/id_rsa.pub
Next add the github ssh key to dsx:
! ssh-keyscan github.com >> ~/.ssh/known_hosts
IMPORTANT: You should manually verify that the imported github hosts key is authentic. You can view the imported key with:
! cat ~/.ssh/known_hosts
You can now install with pip:
! pip install --user git+ssh://[email protected]/private_account/private_repo
CAUTION! that there are some security considerations with the above approach. I.e. anyone with access to the spark service where you performed the above commands will be able to access the git private repository.
NOTE:
Ideally, In the future, I would like to see dsx provide support for editing all files in a project and committing all the project files to github, e.g.