Search code examples
pythondockerspacyrust-cargoarm64

arm64 - How do I install sudachiPy - needed for japanese SpaCy


I should preface by saying that I did follow the SpaCy documentation to install the SpaCy library and the models of interest.

pip install -U pip setuptools wheel
pip install -U 'spacy[apple]'
python -m spacy download zh_core_web_sm
python -m spacy download en_core_web_sm
python -m spacy download fr_core_news_sm
python -m spacy download de_core_news_sm
python -m spacy download ja_core_news_sm
python -m spacy download es_core_news_sm

Currently stuck at installing ja_core_news_sm in my Docker with a python image of python:3.11-slim in my Docker environment. I am installing a few other SpaCy pipeline modules in tandem on my arm64 Docker image and this is the source of conflict. The other pipeline models are :

I get the following error:

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.7/71.7 MB 4.4 MB/s eta 0:00:00
 Building wheels for collected packages: sudachipy
   Building wheel for sudachipy (pyproject.toml): started
   Building wheel for sudachipy (pyproject.toml): finished with status 'error'
   error: subprocess-exited-with-error
   
   × Building wheel for sudachipy (pyproject.toml) did not run successfully.
   │ exit code: 1
   ╰─> [39 lines of output]
       running bdist_wheel
       running build
       running build_py
       creating build
       creating build/lib.linux-aarch64-cpython-311
       creating build/lib.linux-aarch64-cpython-311/sudachipy
       copying py_src/sudachipy/config.py -> build/lib.linux-aarch64-cpython-311/sudachipy
       copying py_src/sudachipy/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy
       copying py_src/sudachipy/command_line.py -> build/lib.linux-aarch64-cpython-311/sudachipy
       copying py_src/sudachipy/errors.py -> build/lib.linux-aarch64-cpython-311/sudachipy
       creating build/lib.linux-aarch64-cpython-311/sudachipy/dictionary
       copying py_src/sudachipy/dictionary/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/dictionary
       creating build/lib.linux-aarch64-cpython-311/sudachipy/tokenizer
       copying py_src/sudachipy/tokenizer/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/tokenizer
       creating build/lib.linux-aarch64-cpython-311/sudachipy/morphemelist
       copying py_src/sudachipy/morphemelist/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/morphemelist
       creating build/lib.linux-aarch64-cpython-311/sudachipy/morpheme
       copying py_src/sudachipy/morpheme/__init__.py -> build/lib.linux-aarch64-cpython-311/sudachipy/morpheme
       copying py_src/sudachipy/sudachipy.pyi -> build/lib.linux-aarch64-cpython-311/sudachipy
       creating build/lib.linux-aarch64-cpython-311/sudachipy/resources
       copying py_src/sudachipy/resources/sudachi.json -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
       copying py_src/sudachipy/resources/rewrite.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
       copying py_src/sudachipy/resources/unk.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
       copying py_src/sudachipy/resources/char.def -> build/lib.linux-aarch64-cpython-311/sudachipy/resources
       warning: build_py: byte-compiling is disabled, skipping.
       
       running build_ext
       running build_rust
       error: can't find Rust compiler
       
       If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
       
       To update pip, run:
       
           pip install --upgrade pip
       
       and then retry package installation.
       
       If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
       [end of output]
   
   note: This error originates from a subprocess, and is likely not a problem with pip.
   ERROR: Failed building wheel for sudachipy
 Failed to build sudachipy
 ERROR: Could not build wheels for sudachipy, which is required to install pyproject.toml-based projects

Here's my Dockerfile:

# Use the official Python image, with Python 3.11
FROM python:3.11-slim

# Set environment variables to reduce Python bytecode generation and buffering
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

# Set working directory
WORKDIR /app

# Install essential dependencies including Python development headers and GCC
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    libpq-dev \
    gcc \
    ffmpeg \
    python3-dev \
    libc-dev \
    && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Update pip and install Python packages
COPY ./docker-requirements.txt /app/
RUN pip install --upgrade pip && \
    pip install --no-cache-dir -r docker-requirements.txt

# Copy application code to container
COPY . /app

# Expose the port the app runs on
EXPOSE 5000

# Make the entrypoint script executable
RUN chmod +x /app/shell_scripts/entrypoint.sh /app/shell_scripts/wait-for-it.sh /app/shell_scripts/docker-ngrok-tunnel.sh

# Define entrypoint
ENTRYPOINT ["/app/shell_scripts/entrypoint.sh"]

Read further, it's an archived module.

Here's the note from PyPi, which explains the issue, given my Mac having an M2 chip:

Binary wheels
We provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture. x86 32-bit architecture is not supported and is not tested. MacOS source builds seem to work on ARM-based (Aarch64) Macs, but this architecture also is not tested and require installing Rust toolchain and Cargo.

I cannot seem to install all of the models


Solution

  • Credit to @aab for nudging in the direction of the Rust compiler.

    Silver bullet was upgrading my es & fr SpaCy pipelines in addition to installing the Rust compiler, because sudachiPy relies on a Rust compiler.

    # Use the official Python image, with Python 3.11
    FROM python:3.11-slim
    
    # Set environment variables to reduce Python bytecode generation and buffering
    ENV PYTHONUNBUFFERED=1 \
        PYTHONDONTWRITEBYTECODE=1
    
    # Set working directory
    WORKDIR /app
    
    # Install essential dependencies including Python development headers and GCC
    RUN apt-get update && \
        apt-get install -y --no-install-recommends \
        python3-dev \
        build-essential \
        git \
        libpq-dev \
        gcc \
        ffmpeg \
        libc-dev \
        curl \
        && apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Install Rust
    RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
    ENV PATH="/root/.cargo/bin:${PATH}"
    
    # Update pip and install Python packages
    COPY ./docker-requirements.txt /app/
    RUN pip install --upgrade pip && \
        pip install --no-cache-dir -r docker-requirements.txt
    
    # Install Cython, SpaCy and language models
    RUN pip install -U pip setuptools wheel && \
        pip install -U spacy && \
        pip install --upgrade 'sudachipy>=0.6.8' && \
        python -m spacy download zh_core_web_sm && \
        python -m spacy download en_core_web_sm && \
        python -m spacy download fr_core_news_md && \
        python -m spacy download de_core_news_sm && \
        python -m spacy download es_core_news_md && \
        python -m spacy download ja_core_news_sm 
    
    # Copy application code to container
    COPY . /app
    
    # Expose the port the app runs on
    EXPOSE 5000
    
    # Make the entrypoint script executable
    RUN chmod +x /app/shell_scripts/entrypoint.sh /app/shell_scripts/wait-for-it.sh /app/shell_scripts/docker-ngrok-tunnel.sh
    
    # Define entrypoint
    ENTRYPOINT ["/app/shell_scripts/entrypoint.sh"]