Search code examples
pippymupdfpaddleocr

How can I fix the 'Error in PyMuPDF' when installing paddleocr with pip?


When doing pip install paddleocr, I am facing an error in building wheel for PyMuPDF.

Building wheels for collected packages: PyMuPDF
Building wheel for PyMuPDF (setup.py) ... error
error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [70 lines of output]



Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 487, in <module>
          mupdf_local = get_mupdf()
                        ^^^^^^^^^^^
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 450, in get_mupdf
          return tar_extract( mupdf_tgz, exists='return')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 183, in tar_extract
          t.extractall()
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2059, in extractall
          self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2100, in extract
          self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2173, in _extract_member
          self.makefile(tarinfo, targetpath)
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2214, in makefile
          with bltn_open(targetpath, "wb") as target:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      FileNotFoundError: [Errno 2] No such file or directory: '.\\mupdf-1.20.3-source\\thirdparty\\harfbuzz\\test\\shaping\\texts\\in-house\\shaper-indic\\script-devanagari\\utrrs\\codepoint\\IndicFontFeatureCodepoint-AdditionalConsonants.txt'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for PyMuPDF
  Running setup.py clean for PyMuPDF
Failed to build PyMuPDF
ERROR: Could not build wheels for PyMuPDF, which is required to install pyproject.toml-based projects

I tried doing pip install wheel and installing the PyMuPDF using pip by pip install PyMuPDF then install paddleocr by pip install paddleocr but the same problem is there error building wheel file for PyMuPDF.

I am using a intel i3, 64 bit processor and python version is 3.11.3


Solution

  • paddleocr has the requirement PyMuPDF<1.21.0 and PyMuPDF==1.20.2 (the latest version that fits the paddleocr requirement) only has whl files up to python 3.10. Therefor, pip falls back to trying to install from source.

    The exact error message is from the install script of PyMuPDF trying to download one of its dependencies, which fails during extraction of the .tar.gz file. You have different options now:

    1. Manually download https://mupdf.com/downloads/archive/mupdf-1.20.3-source.tar.gz then extract the archive to a path of your choosing. Set the environment variable PYMUPDF_SETUP_MUPDF_BUILD to the path of the extracted mupdf-1.20.3 folder and try to run pip install PyMuPDF==1.20.2. Note that you will also need a working compiler for this approach

    2. Download this unofficial whl file: https://drive.google.com/drive/folders/1PESjDkovpvnrWFTKji4-qgT3rcVz-o-F?usp=sharing and install it with pip install <path to the whl file>