Search code examples
pythonpython-3.xpandaspippython-wheel

How to force pip to get a wheel package (even for package dependencies)?


I'm trying to build a multistage docker image with some python packages. For some reason, pip wheel command still downloads source files .tar.gz for few packages even though .whl files exist in Pypi. For example: it does it for pandas, numpy.

Here is my requirements.txt:

# REST client
requests

# ETL
pandas

# SFTP
pysftp
paramiko

# LDAP
ldap3

# SMB
pysmb

First stage of the Dockerfile:

ARG IMAGE_TAG=3.7-alpine
FROM python:${IMAGE_TAG} as python-base
COPY ./requirements.txt /requirements.txt
RUN mkdir /wheels && \
    apk add build-base openssl-dev pkgconfig libffi-dev
RUN pip wheel --wheel-dir=/wheels --requirement /requirements.txt
ENTRYPOINT tail -f /dev/null

Output below shows that it is downloading source package for Pandas but it got a wheel for Requests package. Also, surprisingly it takes a lot of time (I really mean a lot of time) to download and build these packages !!

Step 5/11 : RUN pip wheel --wheel-dir=/wheels --requirement /requirements.txt
 ---> Running in d7bd8b3bd471
Collecting requests (from -r /requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
  Saved /wheels/requests-2.22.0-py2.py3-none-any.whl
Collecting pandas (from -r /requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/0b/1f/8fca0e1b66a632b62cc1ae38e197befe48c5cee78f895edf4bf8d340454d/pandas-0.25.0.tar.gz (12.6MB)

I would like to know how I can force it get a wheel file for all the required packages and also for the dependencies listed in these packages. I observed that some dependencies get a wheel file but others get the source packages.

NOTE: code above is a combination of multiple online sources.

Any help to make this build process easier is greatly appreciated.

Thanks in Advance.


Solution

    1. You are using Alpine Linux. This one is somewhat unique as it uses musl as the underlying libc implementation, as opposed to the most other Linux distros which use glibc.

    2. If a Python project implements C extensions (this is what e.g. numpy or pandas do), it has two options: either

      • offer a source dist (.tar.gz, .tar.bz2 or .zip) so that the C extensions are compiled using the C compiler/library found on the target system, or
      • offer a wheel that contains compiled C extensions. If the extensions are compiled against glibc, they will be unusable on systems using musl, and AFAIK vice versa too.

    Now, Python defines the manylinux1 platform tag which is specified in PEP 513 and updated in PEP 571. Basically, the name says it all - wheels with compiled C extensions should be built against glibc and thus will work on many distros (that use glibc), but not on some (Alpine being one of them).

    For you, it means that you have two possibilities: either build packages from source dists (this is what pip already does), or install the prebuilt packages via Alpine's package manager. E.g. for py3-pandas it would mean doing:

    # echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
    # apk update
    # apk add py3-pandas@edge
    

    However, I don't see a big issue with building packages from source. When done right, you capture it in a separate layer placed as high as possible in the image, so it is cached and not rebuilt each time.


    You might ask, why there's no platform tag analogous to manylinux1, but for musl-based distros? Because no one has written a PEP similar to PEP 513 that defines a musllinux platform tag yet. If you are interested in the current state of it, take a look at the issue #37.


    Update

    PEP 656 That defines a musllinux platform tag is now accepted, so it (hopefully) won't last long until prebuilt wheels for Alpine start to ship. You can track the current implementation state in auditwheel#305.