Search code examples
excelvbatesseract

Extracting Text with VBA Excel and Tesseract


I've been trying to use the code below to extract text from a png image, but without success. It doesn't generate an error, but it doesn't generate anything. I have the 64-bit Wind and I've tried with the 32-bit and 64-bit tesseract and the result is the same, that is, it doesn't generate any output... I got this example from the link: https://www.youtube.com/watch?v=4VP54f0xV-E

Sub Image_into_Excelby_Ajit_Yadav()
  
  Dim myshell As Shell32.Shell, ReadCommand, CaptchaCode, i
  Set myshell = New Shell32.Shell

  For i = 1 To 5 ' Change it
      ReadCommand = "C:\Program Files (x86)\Tesseract-OCR\tesseract.exe " & "'C:\Temp\Sem.png'" & " " & "'C:\Temp\Sem'" & " ' -l eng'"
      myshell.ShellExecute "powershell", vArgs:=ReadCommand, vShow:=0
      Application.Wait (Now + TimeValue("00:00:05"))
      
      Open "C:\Temp\Sem.txt" For Input As #1
      Line Input #1, CaptchaCode
      Close #1
      Application.Wait (Now + TimeValue("00:00:05"))
      MyCaptchaCode = Application.WorksheetFunction.Substitute(CaptchaCode, Chr(10), "")
      Cells(i + 1, 2).Value = Trim(Application.WorksheetFunction.Clean(MyCaptchaCode))
      Cells(i + 1, 1).Value = "Sr. " & i
   Next i
End Sub

To create the image, I took a print screen and saved it in paint

enter image description here

Test being done on a 64-bit win 10 and with the environment variable properly configured and the reference of the dll Microsoft Shell Controls and Automation


Solution

  • If you use the full path name of the exe file, you'll need to use single or double quotes around the path of your exe precedeed with a & (which is the call operator). This is needed because you have spaces in the file path and the way PowerShell deals with spaces.

    ReadCommand = "& 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe' " & "'C:\Temp\Sem.png'" & " " & "'C:\Temp\Sem'" & " ' -l eng'"
    

    Regarding the use of the Path environment variable to use the command without the full path, you'll have to restart Excel for it to take effect (based on the tests I've done).

    As discussed in the comment, you'd also be better off trying the command directly in the PowerShell terminal to make sure that it works properly before trying to implement it in VBA.

    The PowerShell command:

    tesseract.exe 'C:\Temp\Sem.png' 'C:\Temp\Sem' -l eng
    

    Note that in the a PowerShell command executed in the console directly, you don't have to surround the last argument -l eng with single quotes, otherwise you'll get an error.