Search code examples
pythonwindowsubuntupdftotextpoppler

Unable to install pdftotext on windows/Ubuntu


From weeks I have been trying to install pdftotext for python but have faced challenges & failed due to poppler earlier.

So recently I have:

  1. Upgraded Windows 10 to Windows 11 to enable Sudo & use apt commands
  2. installed WSL and Ubuntu in Windows 11 for apt- commands and
  3. Ran following commands:
sudo apt-get update
sudo apt install python3-pip
sudo apt-get install python-poppler
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

all ran till this point: enter image description here

Issue: Now when I goto cmd and run

pip install pdftotext

Error:

Collecting pdftotext
  Using cached pdftotext-2.2.2.tar.gz (113 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: pdftotext
  Building wheel for pdftotext (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [11 lines of output]
      running bdist_wheel
      running build
      running build_ext
      building 'pdftotext' extension
      creating build
      creating build\temp.win-amd64-cpython-39
      creating build\temp.win-amd64-cpython-39\Release
      "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -DPOPPLER_CPP_AT_LEAST_0_58_0=0 -DPOPPLER_CPP_AT_LEAST_0_88_0=0 -IC:\Users\vinee\anaconda3\include -IC:\Users\vinee\anaconda3\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" /EHsc /Tppdftotext.cpp /Fobuild\temp.win-amd64-cpython-39\Release\pdftotext.obj -Wall
      pdftotext.cpp
      C:\Users\vinee\anaconda3\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pdftotext
  Running setup.py clean for pdftotext
Failed to build pdftotext
ERROR: Could not build wheels for pdftotext, which is required to install pyproject.toml-based projects

For issue I referred this SO Post and it mentions about installing CMAKE which I already have and again ran

C:\Windows\System32>pip install Cmake
WARNING: Ignoring invalid distribution -cipy (c:\users\vinee\anaconda3\lib\site-packages)
Requirement already satisfied: Cmake in c:\users\vinee\anaconda3\lib\site-packages (3.30.4)
WARNING: Ignoring invalid distribution -cipy (c:\users\vinee\anaconda3\lib\site-packages)

But I am still stuck on build wheel error. What should I do next. Really need help on this.

Update: I came across this SO post about missing io-h file or directory and I have tried adding below command:

set LIB=C:\Program Files (x86)\Windows Kits\10\Redist\ucrt\DLLs\x64

But I am still getting the same error.


Solution

  • The issue is that your C++ compiler cannot find the header files. Looking at this issue you will need to ensure you have installed: Visual C++ Build Tools core features, MSVC toolset C++, Visual C++ Redist and Windows 10 (or in your case Windows 11) SDK. I found another response where the Windows SDK solved the issue.

    Another option is the use the set INCLUDE and set LIB commands to tell the compiler where the header files are located. Keep in mind this option would only work if you have the header files already installed in another location (see the first link for more info on this).