Search code examples
pythonwindowspowershell

Is there a way to pass an image to the Microsoft snipping tool, have it perform text extraction, and return that extract


I'm working on something that uses OCR for fun. I'm currently using PyTesseract, and it has a very high error rate, so I built a user validation function so the ingested text could be verified. As I was testing different things I got tired of hand typing one of the lines that PyTesseract would constantly get wrong, so I took a snip and used the text extraction function in the snipping tool. It had 100% accuracy.

That got me thinking, that if I could just find a way to get python to invoke the snipping tool, it would A) increase the accuracy of the OCR. B) remove the need to have people who use this thing install Tesseract-OCR so their local python could run the code.

I can't for the life of me figure out how to do it though, or if it is even possible. I read something on here about how to take a screenshot within PowerShell, but that's only half of my problem. Any help would be great.

The code I have currently is located on https://github.com/MElse1/Puzzle-Solving/blob/e591a77af27b7f7325b08edb725e12f31cc0f0c9/Word%20Search%20Solver/PyTesseract%20testing.py it's convoluted and probably badly done, but it gets the job done even if it's very inaccurate due to the OCR solution.

Since I linked my github, this isn't for anything serious, it's just a way to get better with programming logic in a way that's fun for me. This solution is ultimately to solve word searches. I have the logical part of the program down elsewhere in my github, but I thought it would be cool to use OCR to build the lists to search for the words.


Solution

  • Here is a sample on how to do OCR with Powershell (since you also tagged this as Powershell) using only libraries which are built into Windows (Windows 8 and higher have these built in). You can modify this to get the result as a string so you can use it in python.

    <#
        OCR sample using the built in WinRT API
    #>
    
    # FilePath of the image
    $FilePath = "C:\test\OCRTestImage.png"
    
    # OCR Language
    $Language = 'en-us'
    
    # assembly with the WindowsRT exention methods we need
    Add-Type -AssemblyName System.Runtime.WindowsRuntime
    
    # load the RT assemblies we need
    [void][Windows.UI.Core.CoreWindow,Windows.UI.Core,ContentType=WindowsRuntime]
    [void][Windows.Media.Ocr.OcrEngine,Windows.Media.Ocr,ContentType=WindowsRuntime]
    [void][Windows.Storage.StorageFile,Windows.Storage,ContentType=WindowsRunTime]
    [void][Windows.Graphics.Imaging.BitmapDecoder,Windows.Graphics.Imaging,ContentType=WindowsRuntime]
    
    # get the AsTask generic extension method with the signature we need
    $asTaskGeneric = ([System.WindowsRuntimeSystemExtensions].GetMethods() | Where-Object { $_.Name -eq 'AsTask' -and $_.GetParameters().Count -eq 1 -and $_.GetParameters()[0].ParameterType.Name -eq 'IAsyncOperation`1' })[0]
    
    # Load the file
    $StorageFileAsync = [Windows.Storage.StorageFile]::GetFileFromPathAsync($FilePath)
    $StorageFile = $asTaskGeneric.MakeGenericMethod([Windows.Storage.StorageFile]).Invoke($null, $StorageFileAsync).Result
    
    # convert file to a stream
    $StreamAsync = $StorageFile.OpenReadAsync()
    $Stream = $asTaskGeneric.MakeGenericMethod([Windows.Storage.Streams.IRandomAccessStreamWithContentType]).Invoke($null, $StreamAsync).Result
    
    # And convert this to a bitmap => we'll have to create a softwareBitmap
    $BitmapDecoderAsync = [Windows.Graphics.Imaging.BitmapDecoder]::CreateAsync($Stream)
    $BitMap = $asTaskGeneric.MakeGenericMethod([Windows.Graphics.Imaging.BitmapDecoder]).Invoke($null, $BitmapDecoderAsync).Result
    $SoftwareBitmapAsync = $BitMap.GetSoftwareBitmapAsync()
    $SoftwareBitmap = $asTaskGeneric.MakeGenericMethod([Windows.Graphics.Imaging.SoftwareBitmap]).Invoke($null, $SoftwareBitmapAsync).Result
    
    
    # OCR stuff => here we do the actual OCR stuff
    # Create the OCR Engine object
    $OCR = [Windows.Media.Ocr.OcrEngine]::TryCreateFromLanguage($Language)
    # Do the actual recognition
    $OCRResultAsync = $OCR.RecognizeAsync($SoftwareBitmap)
    $OCRResult = $asTaskGeneric.MakeGenericMethod([Windows.Media.Ocr.OcrResult]).Invoke($null, $OCRResultAsync).Result
    
    # output the result
    $OCRResult
    

    In short I load the UWP/RT libraries, get the method signature of the extension method which is used to unwrap the iasync calls and after that it's just calling the methods and unwrapping the results.