Search code examples
pythongithubjupyter-notebookandroid-binder

Github binder requirements.txt


I have, say, a github repository named notebooks_examples. In this repository, I have several folders, each containing a different jupyter notebook, which I want to be executed through MyBinder. All these notebooks are independent from each other, and may need different packages to function well. How can I let each notebook have a different 'requirements.txt' file?

I know I could have a single one at the root of the repository, but this means it prevents one from using different versions of a same package; It also means that when running a given notebook, all packages will be installed even if none is needed in that case.

I also saw that I could place the configuration file in a folder named "binder", but I failed to do so. The structure I tried was as follows. I have a readme file at notebooks_examples/notebook_1/README.md, and I have 2 files here: notebooks_examples/notebook_1/binder/{notebook_1.ipynb,requirements.txt}. However, when I then launch the notebook through Binder, none of my imports work, as if the configuration file had not been seen.

Is there a way to do this without making a new repository for every new notebook? Or is this simply impossible because of how binder works?


Solution

  • Currently, each repository (or repo) corresponds to an environment that will be made by repo2docker, which is the tech underlying the building of the environment portion of a Binderhub. (MyBinder.org is a public-facing federation of public Binderhubs.)
    And the configuration files are either placed in the root directory or in a binder/ directory, and cannot be anywhere else relative the root of the repository, see here.
    Hence, presently you cannot get out of having at least one repository for each environment you need; however, unless all your notebooks require different environments entirely, you wouldn't need a separate repository for each notebook.

    Additionally, your notebooks can remain where they presently are in Github and you can place launch binder links with each that will launch sessions with a specific notebook (or collection of notebooks) and the correct environment.
    One way to do that is outlined here. However, in your case, nbgitpuller would copy in all the notebooks you currently have and then open a specific one. That might work fine for what you describe if you set up repositories which each needed environment and then create the URLs you need. As an alternative to nbgitpuller, you could specify in your 'environment' repo specifically what notebooks to get and where to place them using a start config file where you could use curl to fetch specific the notebooks from your all-inclusive repo. An example of using start similar to that is here. So for your specific case you might want to determine what blocks of notebooks could be placed together and make a repository for each environment. Then the start file will retrieve those notebooks when the session launches. The launch button URL configured from here can then be used to specify which specific notebook to open at start-up of the session.

    (Full disclosure about the one-repo-to-one-environment issue:
    If you go to here and look for the section that begins 'I can’t think of a reason why you couldn’t store a requirements.txt...', you'll see there have been discussions about allowing other solutions than the current repo corresponding to an environment-based one.)