Search code examples
c++visual-studionugetocrtesseract

How to use Tesseract as Nuget package in C++ Console Application Visual Studio 2022


My goal is to extract numbers and text from an input image using Tesseract in C++ with Nuget package in VStudio 2022.

I downloaded tesseract 5.2.0 (right click on project -> Manage Nuget packages -> Browse) as Nuget package 5.2.0

enter image description here

and however when i include tesseract as:

#include <iostream>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>


int main() {
    tesseract::TessBaseAPI* api = new tesseract::TessBaseAPI();
    if (api->Init(nullptr, "eng")) {
        fprintf(stderr, "Could not initialize Tesseract.\n");
        exit(1);
    }

    // Open an image 
    Pix* image = pixRead("D:/tools/img.png");
    api->SetImage(image);

    // Perform OCR
    char* outText = api->GetUTF8Text();
    std::cout << "OCR Output:\n" << outText << std::endl;

    // Release resources
    api->End();
    delete[] outText;
    pixDestroy(&image);

    return 0;
}

it gives me these errors.

enter image description here

I also read THIS and installed runtime 64 but didn work.

Does anyone know the way to use tesseract with C++ as Nuget package or any work around.


Solution

  • Thank you Bowman Zhu-MSFT for your answer. It gave me a direction. After thorough search I am able to find a solution to use tesseract with C++. There is a complete official tesseract documentation. However I didn't go with built solution because it takes longer and and bit more complicated for me. As per Bowman suggestion I tried with vcpkg. Following the steps from the vcpkg I installed vckg on windows. For this step :

    .\vcpkg\vcpkg install [packages to install] --triplet=x64-windows
    

    I replaced with this to install tesseract on windows for x64 version

    .\vcpkg\vcpkg install tesseract:x64-windows-static
    

    and finally i run this

    .\vcpkg\vcpkg integrate install
    

    Then I open a new console application with C++ in Visual studio 2022. Now it have to include the packages, includes and lib folders and to do so, i did following steps:

    • First go the folder where you have installed the vckpg. It must be having many sub folders.
    • Copy the path of the include directory inside the vckpg folder. Go to the project in Visual Studio and right click on the project in solution explorer window and choose properties option and add the path of the include folder to the C/C++ -> General -> Additional Include Directories enter image description here
    • Copy the path of the lib directory inside the vckpg folder. Go to the project in Visual Studio and right click on the project in solution explorer window and choose properties option and add the path of the include folder to the Linker -> General -> Additional Library Directories enter image description here

    then I build and run the same piece of code as i mentioned in my question and everything worked well. Those errors were not there anymore.

    However i met with another error from tesseract:

    enter image description here

    To get ride of this I have followed the following steps:

    • I went to the official tessdoc
    • Then I downloaded single "eng" file and saved it on my PC, the eng file i have downloaded from here
    • I have saved the eng file like following subfolder structure C:\tools\TesseractData\tessdata. The eng file is inside tessdata folder.
    • Then on my PC I added environment variable TESSDATA_PREFIX with value as the path C:\tools\TesseractData\tessdata of the eng file.
    • Finally I restarted my PC and build the project again and run it and tesseract start detecting text and digits like a charm.