Search code examples
python-3.xgoogle-bigquerydockerfilepyarrowarmv7

"ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly" on armv7 architecture with Linux Debian Buster


I build a Docker image for an armv7 architecture with python packages numpy, scipy, pandas and google-cloud-bigquery using packages from piwheels. The base image is Python:3.7-buster.

If I'm running a container with this image, the container always restarts and gives me the error log "ValueError: This method requires pyarrow to be installed":

Traceback (most recent call last):
  File "main_prog.py", line 3, in <module>
    upload_data()
  File "/usr/src/app/bigquery.py", line 39, in upload_data
    job = client.load_table_from_dataframe(dataframe, table_id, job_config=job_config)  # Make an API request.
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2574, in load_table_from_dataframe
    raise ValueError("This method requires pyarrow to be installed")
ValueError: This method requires pyarrow to be installed

So I tried to install pyarrow directly in my Dockerfile with:

RUN pip3 install pyarrow

This gives me the error "ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly" during the image build:

> [10/11] RUN pip3 install pyarrow:
#14 164.9   copying pyarrow/tests/parquet/test_parquet_writer.py -> build/lib.linux-armv7l-3.7/pyarrow/tests/parquet
#14 164.9   running build_ext
#14 164.9   creating /tmp/pip-install-jiim0m92/pyarrow_07d2ad5142d7405fa1b4bb2fe83e0428/build/temp.linux-armv7l-3.7
#14 164.9   -- Running cmake for pyarrow
#14 164.9   cmake -DPYTHON_EXECUTABLE=/usr/local/bin/python -DPython3_EXECUTABLE=/usr/local/bin/python  -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=off -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off -DPYARROW_BUNDLE_BOOST=off -DPYARROW_GENERATE_COVERAGE=off -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /tmp/pip-install-jiim0m92/pyarrow_07d2ad5142d7405fa1b4bb2fe83e0428
#14 164.9   error: command 'cmake' failed with exit status 1
#14 164.9   ----------------------------------------
#14 164.9   ERROR: Failed building wheel for pyarrow
#14 164.9 Failed to build pyarrow
#14 164.9 ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

Then like its recommended here I tried:

RUN pip3 install pandas-gbq==0.14.0 

and

RUN pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'

but nothing worked and every time I get the same error like above. I couldn't find a wheel for pyarrow for armv7 neither on piwheels nor on PyPi.

Does anyone knows an answer? Thank you for your help!


Solution

  • I solved this problem by using a seperate container image with Node-RED

    FROM nodered/node-red:latest
    
    RUN npm install node-red-contrib-google-cloud
    

    on which I could use the google-cloud packages. This container handles now the upload task to google-cloud. To use node-red with docker I visited this site and this was the google-cloud-package I installed.