For couple of reasons we decided to host all our private and public Python dependencies(and their dependencies) on Amazon S3. We intend to download/install the packages only from S3 and nowhere else.
I followed the steps mentioned at https://stackoverflow.com/a/57552988/3007402 (I wrote the answer) to setup pypi server on S3.
To upload public packages to S3, I would first download them using
pip download numpy==1.14.2
pip download statsmodels==0.6.1
To install any package I would use
pip install pandas --index-url=http://<s3_endpoint> --trusted-host=<s3_endpoint> --no-cache-dir
Everything is working fine with packages that are downloaded as .whl
files. Such packages(for e.g. pandas
) are able to install themselves and their dependencies(numpy
in case of pandas
) without any problems.
The issue is with non-whl packages such as statsmodels-0.6.1.tar.gz
. While pip
is used to install statsmodels
, to install the dependencies, statsmodels
uses easy_install
.
The pip arg --index-url
is not used by easy_install
and it would download the dependency - numpy
from pypi.org.
To fix this(download only from S3), I extracted statsmodels-0.6.1.tar.gz
, edited setup.cfg
, repackaged it and uploaded to S3. Below is the content of setup.cfg
:
[egg_info]
tag_build =
tag_date = 0
tag_svn_revision = 0
# lines below are added by me
[easy_install]
index_url = http://<s3_link>
find_links = http://<s3_link>
With that change statsmodels
fetches the dependency numpy
from S3 and installs it successfully.
For some odd reason, this only works in Ubuntu(local and EC2 running Ubuntu) but fails on an EC2 running Amazon Linux. Below is the log I saved using --log <file>
argument to pip. I removed the timestamp for brevity.
Created temporary directory: /tmp/pip-ephem-wheel-cache-7SD5Bu
Created temporary directory: /tmp/pip-req-tracker-du4AEi
Created requirements tracker '/tmp/pip-req-tracker-du4AEi'
Created temporary directory: /tmp/pip-install-G2qw36
Looking in indexes: http://<s3_link>
Collecting statsmodels
1 location(s) to search for versions of statsmodels:
* http://<s3_link>/statsmodels/
Getting page http://<s3_link>/statsmodels/
Found index url http://<s3_link>
Analyzing links from page http://<s3_link>/statsmodels/
Found link http://<s3_link>/statsmodels/statsmodels-0.6.1.tar.gz (from http://<s3_link>/statsmodels/), version: 0.6.1
Given no hashes to check 1 links for project 'statsmodels': discarding no candidates
Using version 0.6.1 (newest of versions: 0.6.1)
Created temporary directory: /tmp/pip-unpack-r8lKU4
Found index url http://<s3_link>
Downloading http://<s3_link>/statsmodels/statsmodels-0.6.1.tar.gz (7.1MB)
Downloading from URL http://<s3_link>/statsmodels/statsmodels-0.6.1.tar.gz (from http://<s3_link>/statsmodels/)
Added statsmodels from http://<s3_link>/statsmodels/statsmodels-0.6.1.tar.gz to build tracker '/tmp/pip-req-tracker-du4AEi'
Running setup.py (path:/tmp/pip-install-G2qw36/statsmodels/setup.py) egg_info for package statsmodels
Running command python setup.py egg_info
No local packages or download links found for numpy
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-G2qw36/statsmodels/setup.py", line 449, in <module>
**setuptools_kwargs)
File "/usr/lib64/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/setuptools/dist.py", line 265, in __init__
self.fetch_build_eggs(attrs['setup_requires'])
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/setuptools/dist.py", line 311, in fetch_build_eggs
replace_conflicting=True,
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 797, in resolve
dist = best[req.key] = env.best_match(req, ws, installer)
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1047, in best_match
return self.obtain(req, installer)
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1059, in obtain
return installer(requirement)
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/setuptools/dist.py", line 378, in fetch_build_egg
return cmd.easy_install(req)
File "/home/ec2-user/tempenv/local/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 617, in easy_install
raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('numpy')
Output of cat /etc/os-release
(Amazon Linux details):
NAME="Amazon Linux AMI"
VERSION="2017.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2017.03"
PRETTY_NAME="Amazon Linux AMI 2017.03"
Apparently, the EC2 running Amazon Linux was having an older version of setuptools
.
I upgraded to the latest version and my installation went fine. 😅