I want to set up an offline PyPI repository for my homelab (disconnected from the Internet, only offline lab).
I want to download all *.tar.gz/*.whl files that work only in three versions of Python (and not download everything in PyPI).
The official Python documentation suggested using Bandersnatch, but I have not had luck with it.
It downloaded about 1.1 TB of packages with the below configuration, but never downloaded the "requests" package (which is supposed to be requests-2.31.0.tar.gz or requests-2.31.0-py3-non-any.whl). These files did not get downloaded.
Is the following banderswitch configuration file correct?
/etc/bandersnatch.conf | grep -v '^;' | sed '/^$/d'
[mirror]
directory = /mnt/mylabnas01/repos/pypi
json = false
release-files = true
cleanup = false
master = https://pypi.org
timeout = 10
global-timeout = 1800
workers = 5
hash-index = false
simple-format = ALL
stop-on-error = false
storage-backend = filesystem
verifiers = 3
compare-method = hash
[allowlist]
platforms =
py3.6.8
py3.8.6
py3.8.7
py3.10.6
When I used the above configuration file as "/etc/bandersnatch.conf". and ran bandersnatch mirror ... after ~5 days, it downloaded 1.1 TB of packages (as root):
du -sh ./pypi/web/*
Output:
0 local-stats
1.1T packages
834M simple
It created a folder which contained directories like this:
./pypi/web/simple/
./pypi/web/packages/a9/09/78fd02c25977348689dbec2040e92c93ce743073842132c0e9f9910a223e/flask_dance-6.2.0-py3-none-any.whl
./pypi/web/packages/91/c8/cfbf90d7d1d148c5e0be4744d98acf900ce14486257407dd0565c667b892/flaskconstructicon-1.0.3-py3-none-any.whl
./pypi/web/packages/91/53/b0a9fcc1b1297f51e68b69ed3b7c3c40d8c45be1391d77ae198712914392/flask_sqlalchemy-3.1.1.tar.gz
./pypi/web/packages/91/6a/161c7730b8d55c88ec826d7b389098787939427c3202270cb9d07df73746/flask_image_search-0.5.0-py2.py3-none-any.whl
./pypi/web/packages/91/cc/7b14c479b4631cfaf6d582cf9c3511717f6dc5df8fda7037b53da6f5cf43/aa_standingsrequests-1.3.0-py3-none-any.whl
./pypi/web/packages/da/57/33191298260e491d1d4fea7e6a10fac124b6fc90b5fdbe368d32d31ee6a7/flask_sqlalchemy_whoosh-0.1.2-py2.py3-none-any.whl
... and so forth.
There are zips/wheel files, so Bandersnatch did download the files, but when searching for "requests" it was not downloaded.
How can I resolve this?
The problem is that some packages will only be published for older versions of Python, but they are really forward-compatible with newer versions. This means you can't really restrict downloads to only some fixed Python version.
What you can do, however, is blacklist a bunch of the versions / platforms you don't need, to cut down the overall amount downloaded. For example:
[blocklist]
platforms =
macos
freebsd
py2.4
py2.5
py2.6
py2.7
Note that you can only specify Python versions up to the minor version. You can provide a patch version, but its seemingly ignored. For a complete list of supported platforms, see the documentation here.
In addition, you can download only the latest versions of each package. I wouldn't recommend X=1, because you'll run into buggy versions of packages that you want to skip or you'll find that some package explicitly disallows a specific version of some dependency that has known issues. Setting this to something like 3 or 5 is generally a good compromise between space and flexibility. You may occasionally run into a package that wants only a very specific version of some dependency (which you may not have), but that's very uncommon. This configuration would look like:
[latest_release]
keep = 3
Finally, you can also restrict by package size, to exclude some of the excessively large packages that just bundle large amounts of assets with the core package. I personally would not recommend this—you'll find that you accidentally drop some common packages like NumPy and SciPy. You can use it as long as you set the threshold high enough, but you need to verify that you haven't dropped anything crucial if so. For completeness, here's what that option would look like:
[size_project_metadata]
max_package_size = 1G
The corresponding plugin names for each of these filters is as follows:
[plugins]
enabled =
exclude_platform
latest_release
size_project_metadata
Additionally, I would recommend that you enable json = true
under the [mirror]
section. It doesn't really use that much extra space, and makes your life simpler, and also enables you to setup pip search
functionality if you want that.
My recommendations above come from personal experience using Bandersnatch to manage my own offline mirror. I keep everything Python 3 that are on any OS platform I use, the latest five versions, and no package size limits. It works very well, and I can only remember one package that I had issues with due to over-restrictive dependency declarations, which I resolved by forcibly installing both the newer version of the dependency and the package itself with pip install --no-dependencies PACKAGE
.