I should preface by saying that I did follow the SpaCy
documentation to install the SpaCy
library and the models of interest.
pip install -U pip setuptools wheel
pip install -U 'spacy[apple]'
python -m spacy download zh_core_web_sm
python -m spacy download en_core_web_sm
python -m spacy download fr_core_news_sm
python -m spacy download de_core_news_sm
python -m spacy download ja_core_news_sm
python -m spacy download es_core_news_sm
Currently stuck at installing ja_core_news_sm
in my Docker with a python image of python:3.11-slim
in my Docker environment. I am installing a few other SpaCy
pipeline modules in tandem on my arm64
Docker image and this is the source of conflict. The other pipeline models are :
I get the following error:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.7/71.7 MB 4.4 MB/s eta 0:00:00
Building wheels for collected packages: sudachipy
Building wheel for sudachipy (pyproject.toml): started
Building wheel for sudachipy (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Building wheel for sudachipy (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [39 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-aarch64-cpython-311
creating build/lib.linux-aarch64-cpython-311/sudachipy
copying py_src/sudachipy/config.py -> build/lib.linux-aarch64-cpython-311/sudachipy
copying py_src/sudachipy/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy
copying py_src/sudachipy/command_line.py -> build/lib.linux-aarch64-cpython-311/sudachipy
copying py_src/sudachipy/errors.py -> build/lib.linux-aarch64-cpython-311/sudachipy
creating build/lib.linux-aarch64-cpython-311/sudachipy/dictionary
copying py_src/sudachipy/dictionary/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/dictionary
creating build/lib.linux-aarch64-cpython-311/sudachipy/tokenizer
copying py_src/sudachipy/tokenizer/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/tokenizer
creating build/lib.linux-aarch64-cpython-311/sudachipy/morphemelist
copying py_src/sudachipy/morphemelist/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/morphemelist
creating build/lib.linux-aarch64-cpython-311/sudachipy/morpheme
copying py_src/sudachipy/morpheme/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/morpheme
copying py_src/sudachipy/sudachipy.pyi -> build/lib.linux-aarch64-cpython-311/sudachipy
creating build/lib.linux-aarch64-cpython-311/sudachipy/resources
copying py_src/sudachipy/resources/sudachi.json -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
copying py_src/sudachipy/resources/rewrite.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
copying py_src/sudachipy/resources/unk.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
copying py_src/sudachipy/resources/char.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
warning: build_py: byte-compiling is disabled, skipping.
running build_ext
running build_rust
error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for sudachipy
Failed to build sudachipy
ERROR: Could not build wheels for sudachipy, which is required to install pyproject.toml-based projects
Here's my Dockerfile:
# Use the official Python image, with Python 3.11
FROM python:3.11-slim
# Set environment variables to reduce Python bytecode generation and buffering
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Set working directory
WORKDIR /app
# Install essential dependencies including Python development headers and GCC
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libpq-dev \
gcc \
ffmpeg \
python3-dev \
libc-dev \
&& apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Update pip and install Python packages
COPY ./docker-requirements.txt /app/
RUN pip install --upgrade pip && \
pip install --no-cache-dir -r docker-requirements.txt
# Copy application code to container
COPY . /app
# Expose the port the app runs on
EXPOSE 5000
# Make the entrypoint script executable
RUN chmod +x /app/shell_scripts/entrypoint.sh /app/shell_scripts/wait-for-it.sh /app/shell_scripts/docker-ngrok-tunnel.sh
# Define entrypoint
ENTRYPOINT ["/app/shell_scripts/entrypoint.sh"]
Read further, it's an archived module.
Here's the note from PyPi, which explains the issue, given my Mac having an M2 chip:
Binary wheels
We provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture. x86 32-bit architecture is not supported and is not tested. MacOS source builds seem to work on ARM-based (Aarch64) Macs, but this architecture also is not tested and require installing Rust toolchain and Cargo.
I cannot seem to install all of the models
Credit to @aab for nudging in the direction of the Rust compiler.
Silver bullet was upgrading my es
& fr
SpaCy
pipelines in addition to installing the Rust compiler, because sudachiPy
relies on a Rust
compiler.
# Use the official Python image, with Python 3.11
FROM python:3.11-slim
# Set environment variables to reduce Python bytecode generation and buffering
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Set working directory
WORKDIR /app
# Install essential dependencies including Python development headers and GCC
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-dev \
build-essential \
git \
libpq-dev \
gcc \
ffmpeg \
libc-dev \
curl \
&& apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
# Update pip and install Python packages
COPY ./docker-requirements.txt /app/
RUN pip install --upgrade pip && \
pip install --no-cache-dir -r docker-requirements.txt
# Install Cython, SpaCy and language models
RUN pip install -U pip setuptools wheel && \
pip install -U spacy && \
pip install --upgrade 'sudachipy>=0.6.8' && \
python -m spacy download zh_core_web_sm && \
python -m spacy download en_core_web_sm && \
python -m spacy download fr_core_news_md && \
python -m spacy download de_core_news_sm && \
python -m spacy download es_core_news_md && \
python -m spacy download ja_core_news_sm
# Copy application code to container
COPY . /app
# Expose the port the app runs on
EXPOSE 5000
# Make the entrypoint script executable
RUN chmod +x /app/shell_scripts/entrypoint.sh /app/shell_scripts/wait-for-it.sh /app/shell_scripts/docker-ngrok-tunnel.sh
# Define entrypoint
ENTRYPOINT ["/app/shell_scripts/entrypoint.sh"]