Search code examples
pythonwindows-8.1ocrcommand-prompttesseract

Error while installing tesseract-ocr


I want to use pytesseract for ocr. So installed it. But before that i needed to install tesseract-ocr. I am using windows 8.1. I opened the command line and ran the command pip install tesseract-ocr. The following lines are the results of that command.

I am not able to understand whats happening here. How can I understand this and help me to successfully install tesseract on my pc?

C:\Users\HarshLaptop>pip install tesseract-ocr
Collecting tesseract-ocr
  Using cached https://files.pythonhosted.org/packages/e2/0d/dcee3dd0fc4c7bcd181
25a98f8ba6d9db7aecaa40770595203e312649587/tesseract-ocr-0.0.1.tar.gz
Requirement already satisfied: cython in c:\users\harshlaptop\anaconda3\lib\site
-packages (from tesseract-ocr) (0.25.2)
Building wheels for collected packages: tesseract-ocr
  Running setup.py bdist_wheel for tesseract-ocr ... error
  Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c "
import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\
\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open
)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __f
ile__, 'exec'))" bdist_wheel -d C:\Users\HARSHL~1\AppData\Local\Temp\pip-wheel-s
j29zfyo --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  file tesseract_ocr.py (for module tesseract_ocr) not found
  file tesseract_ocr.py (for module tesseract_ocr) not found
  running build_ext
  building 'tesseract_ocr' extension
  creating build
  creating build\temp.win-amd64-3.6
  creating build\temp.win-amd64-3.6\Release
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c
 /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic:\
users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual S
tudio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10
240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Pro
gram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows
Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6\R
elease\tesseract_ocr.obj
  tesseract_ocr.cpp
  tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'leptonic
a/allheaders.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN
\\x86_amd64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for tesseract-ocr
  Running setup.py clean for tesseract-ocr
Failed to build tesseract-ocr
Installing collected packages: tesseract-ocr
  Running setup.py install for tesseract-ocr ... error
    Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c
 "import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Tem
p\\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', op
en)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, _
_file__, 'exec'))" install --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-rec
ord-vnlr99lk\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    file tesseract_ocr.py (for module tesseract_ocr) not found
    file tesseract_ocr.py (for module tesseract_ocr) not found
    running build_ext
    building 'tesseract_ocr' extension
    creating build
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
/c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic
:\users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual
 Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\P
rogram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Window
s Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6
\Release\tesseract_ocr.obj
    tesseract_ocr.cpp
    tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'lepton
ica/allheaders.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\B
IN\\x86_amd64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\users\harshlaptop\anaconda3\python.exe -u -c "import setuptools, tok
enize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\\pip-install-x8nz3uhm\
\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.rea
d().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" insta
ll --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-record-vnlr99lk\install-rec
ord.txt --single-version-externally-managed --compile" failed with error code 1
in C:\Users\HARSHL~1\AppData\Local\Temp\pip-install-x8nz3uhm\tesseract-ocr\`enter code here`

Solution

  • I had the same exact issue. Using Visual studio 2017, on windows 10 machine and python 3.6 installed. What worked for me was to:

    1. Download and Install tesseract-ocr executable from https://github.com/UB-Mannheim/tesseract/wiki (Script assumes running from a windows system and saved tesseract installation to the default location suggested I.e. C:\Program Files (x86)\Tesseract-OCR) See https://github.com/tesseract-ocr/tesseract/wiki for more information on installing on different OS types (including windows), using the pre-built binary package.
    2. Ensure you have Python Imaging Library('PIL') or 'pillow' package installed for opening images. (installing PIL didn't work in my setting but pillow did i.e. pip install pillow). The reason you need this is because it is required by pytesseract. See https://pypi.org/project/pytesseract/0.2.5/ for more info on that.
    3. Then to use it successfully in your code simply set the tesseract_cmd path within your code as follows:

      from PIL import Image
      import pytesseract
      
      try:
      img = Image.open(path/to/image.png) 
      pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
      text = pytesseract.image_to_string(path/to/image.png)
      Print(text)
      

      Hope it helps.