Search code examples
pythondocker

Tokenizers && Docker :Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects


I have a project with Python 3.9 using some lib to create chatbot, and actually, until now I couldn't build my Docker image without issues with the tokenizers. The Terminal just showed up this error with tokenizers like this:

#0 96.36       warning: `#[macro_use]` only has an effect on `extern crate` and modules
#0 96.36         --> tokenizers-lib/src/utils/mod.rs:24:1
#0 96.36          |
#0 96.36       24 | #[macro_use]
#0 96.36          | ^^^^^^^^^^^^
#0 96.36          |
#0 96.36          = note: `#[warn(unused_attributes)]` on by default
#0 96.36       
#0 96.36       warning: `#[macro_use]` only has an effect on `extern crate` and modules
#0 96.36         --> tokenizers-lib/src/utils/mod.rs:35:1
#0 96.36          |
#0 96.36       35 | #[macro_use]
#0 96.36          | ^^^^^^^^^^^^
#0 96.36       
#0 96.36       warning: variable does not need to be mutable
#0 96.36          --> tokenizers-lib/src/models/unigram/model.rs:280:21
#0 96.36           |
#0 96.36       280 |                 let mut target_node = &mut best_path_ends_at[key_pos];
#0 96.36           |                     ----^^^^^^^^^^^
#0 96.36           |                     |
#0 96.36           |                     help: remove this `mut`
#0 96.36           |
#0 96.36           = note: `#[warn(unused_mut)]` on by default
#0 96.36       
#0 96.36       warning: variable does not need to be mutable
#0 96.36          --> tokenizers-lib/src/models/unigram/model.rs:297:21
#0 96.36           |
#0 96.36       297 |                 let mut target_node = &mut best_path_ends_at[starts_at + mblen];
#0 96.36           |                     ----^^^^^^^^^^^
#0 96.36           |                     |
#0 96.36           |                     help: remove this `mut`
#0 96.36       
#0 96.36       warning: variable does not need to be mutable
#0 96.36          --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:175:59
#0 96.36           |
#0 96.36       175 |     encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
#0 96.36           |                                                           ----^^^^^^^
#0 96.36           |                                                           |
#0 96.36           |                                                           help: remove this `mut`
#0 96.36       
#0 96.36       warning: fields `bos_id` and `eos_id` are never read
#0 96.36         --> tokenizers-lib/src/models/unigram/lattice.rs:59:5
#0 96.36          |
#0 96.36       53 | pub struct Lattice<'a> {
#0 96.36          |            ------- fields in this struct
#0 96.36       ...
#0 96.36       59 |     bos_id: usize,
#0 96.36          |     ^^^^^^
#0 96.36       60 |     eos_id: usize,
#0 96.36          |     ^^^^^^
#0 96.36          |
#0 96.36          = note: `Lattice` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
#0 96.36          = note: `#[warn(dead_code)]` on by default
#0 96.36       
#0 96.36       error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
#0 96.36          --> tokenizers-lib/src/models/bpe/trainer.rs:517:47
#0 96.36           |
#0 96.36       513 |                     let w = &words[*i] as *const _ as *mut _;
#0 96.36           |                             -------------------------------- casting happend here
#0 96.36       ...
#0 96.36       517 |                         let word: &mut Word = &mut (*w);
#0 96.36           |                                               ^^^^^^^^^
#0 96.36           |
#0 96.36           = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
#0 96.36           = note: `#[deny(invalid_reference_casting)]` on by default
#0 96.36       
#0 96.36       warning: `tokenizers` (lib) generated 6 warnings
#0 96.36       error: could not compile `tokenizers` (lib) due to 1 previous error; 6 warnings emitted
#0 96.36       
#0 96.36       Caused by:
#0 96.36         process didn't exit successfully: `/root/.rustup/toolchains/stable-aarch64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' -C metadata=db1c9e5c051c9526 -C extra-filename=-db1c9e5c051c9526 --out-dir /tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps -C strip=debuginfo -L dependency=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps --extern clap=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libclap-a2efed2048e515cd.rmeta --extern derive_builder=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libderive_builder-f45abd3e50d5c4b1.so --extern esaxx_rs=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libesaxx_rs-5d67a7d5bf69571a.rmeta --extern indicatif=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libindicatif-3745db840d02c9d1.rmeta --extern itertools=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libitertools-d41acda010f86506.rmeta --extern lazy_static=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/liblazy_static-8afaa52d2e642eca.rmeta --extern log=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/liblog-5ec826ff5f942d05.rmeta --extern onig=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libonig-1eb72392234cfab2.rmeta --extern rand=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/librand-3fd1f5796520bc86.rmeta --extern rayon=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/librayon-b94786a5ddf25517.rmeta --extern rayon_cond=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/librayon_cond-f33f6d65fe84e7d1.rmeta --extern regex=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libregex-34f5cf4adb2c33e0.rmeta --extern regex_syntax=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libregex_syntax-009bcc7cb2558fcc.rmeta --extern serde=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libserde-79906c850b0399df.rmeta --extern serde_json=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libserde_json-a2894ae5c2de74a6.rmeta --extern spm_precompiled=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libspm_precompiled-441c1e24a1370d3c.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libunicode_normalization_alignments-01aa3be98607d994.rmeta --extern unicode_segmentation=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libunicode_segmentation-a4ebe9150d5c7ec3.rmeta --extern unicode_categories=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/deps/libunicode_categories-b19bf80bbc3648ef.rmeta -L native=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/build/esaxx-rs-ecac19c49e11468f/out -L native=/tmp/pip-install-xcgxdywp/tokenizers_af180464259e44f0a267f683e1021503/target/release/build/onig_sys-5736dac9455cd079/out` (exit status: 1)
#0 96.36       warning: build failed, waiting for other jobs to finish...
#0 96.36       error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101
#0 96.36       [end of output]
#0 96.36   
#0 96.36   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 96.37   Building wheel for seqeval (setup.py): started
#0 96.37   ERROR: Failed building wheel for tokenizers
#0 96.66   Building wheel for seqeval (setup.py): finished with status 'done'
#0 96.66   Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16161 sha256=f3351a339c1317adeba7113e692efb9dd81f82d1cc1b36e637d8e06fe395356b
#0 96.66   Stored in directory: /tmp/pip-ephem-wheel-cache-xem1b0y_/wheels/e2/a5/92/2c80d1928733611c2747a9820e1324a6835524d9411510c142
#0 96.66 Successfully built blinker neo4j python-crfsuite seqeval
#0 96.66 Failed to build tokenizers
#0 96.66 ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
------
Dockerfile:21
--------------------
  19 |     RUN apt-get update && apt-get install -y build-essential
  20 |     RUN pip3 install --only-binary :all: tokenizers
  21 | >>> RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
  22 |     
  23 |     COPY . .
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install --no-cache-dir --upgrade -r requirements.txt" did not complete successfully: exit code: 1

Im trying to set up all the things needed with Rust compiler and this is my Dockerfile set up for build stage:

    FROM python:3.9-slim as build-stage

    WORKDIR /usr/flask-chatbot-server

    COPY requirements.txt .

    RUN apt-get update && apt-get install -y curl
    RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
    
    ENV PATH="/root/.cargo/bin:${PATH}"
    
    RUN pip3 install --upgrade pip
    RUN pip3 install --upgrade setuptools wheel
    RUN apt-get update && apt-get install -y build-essential
    RUN pip3 install --only-binary :all: tokenizers
    RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
    
    COPY . .
    RUN rm requirements.txt

It always ends up with the error mentioned in the title: "Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects".

If there is any mistake in asking the question, please forgive me because this is the first time I have asked a question here, but I really need help.


Solution

  • I was able to build the image using the Dockerfile and requirement you provideds.

    Sometimes when the biuild process was unsuccessfult it can be helpful to clean the cache with the next build with

    docker build --no-cache . 
    

    This way your build is not affected by cached layers that might have previously had some issue.

    Additional info from the comment:

    RUSTUP_TOOLCHAIN which will set the version of the rust-toolchain to match what is set in the build spec