I'm using Windows.Media.OCR engine to scan these two lines
But the software scan them like that:
While I'm expecting it to scan like:
KIBA/USDT 0.00003826 6.31M KIBA 241.68459400 USDT
KIBA/USDT 0.00003470 17.13M KIBA 594.48387000 USDT
The code I'm using is:
'require references: "C:\Program Files (x86)\Windows Kits\10\UnionMetadata\Windows.winmd"
'"C:\ProgramFiles(x86)\ReferenceAssemblies\Microsoft\Framework.NETCore\v4.5\System.Runtime.WindowsRuntime.dll"
' and windows 10 sdk
Imports Windows.Media.Ocr
Imports System.IO
Imports System.Runtime.InteropServices.WindowsRuntime
Public Class Form1
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim softwareBmp As Windows.Graphics.Imaging.SoftwareBitmap
Using bmp As Bitmap = New Bitmap(PictureBox1.Width, PictureBox1.Height)
Using g As Graphics = Graphics.FromImage(bmp)
Dim pt As Point = Me.PointToScreen(New Point(PictureBox1.Left, PictureBox1.Top))
g.CopyFromScreen(pt.X, pt.Y, 0, 0, bmp.Size, CopyPixelOperation.SourceCopy)
Using memStream = New Windows.Storage.Streams.InMemoryRandomAccessStream()
bmp.Save(memStream.AsStream(), System.Drawing.Imaging.ImageFormat.Bmp)
Dim decoder As Windows.Graphics.Imaging.BitmapDecoder = Await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(memStream)
softwareBmp = Await decoder.GetSoftwareBitmapAsync()
End Using
End Using
End Using
Dim ocrEng = OcrEngine.TryCreateFromLanguage(New Windows.Globalization.Language("en-US"))
Dim languages As IReadOnlyList(Of Windows.Globalization.Language) = ocrEng.AvailableRecognizerLanguages
For Each language In languages
Console.WriteLine(language.LanguageTag)
Next
Dim r = ocrEng.RecognizerLanguage
Dim n = ocrEng.MaxImageDimension
Dim ocrResult = Await ocrEng.RecognizeAsync(softwareBmp)
RichTextBox1.Text = ocrResult.Text
End Sub
End Class
Which kind of change does this code needs in order to scan by row and not by column?
edit: Binary: code
so there is 0D 0A between rows
but I didn't post it before cause I anyway I will need to scan only from 0.000038 ecc to 0.0000%
I chose to act on the output string instead of tackling the OCR API.
Fixing the issue within the OCR API would probably be a superior solution if possible, but I could not get your code properly referenced in my system.
So you can add this function to transpose the string
Private Function transpose(input As String) As String
Dim numberOfColumns = 4 ' this must be known and could be a parameter to this function
Dim fixedInput = input.Replace(" KIBA", "|KIBA").Replace(" USDT", "|USDT")
Dim splitInput = fixedInput.Split(" "c)
Dim numberOfWords = splitInput.Count()
Dim numberOfRows = numberOfWords / numberOfColumns
Dim words As New List(Of String)()
For row = 0 To numberOfRows - 1
For col = 0 To numberOfColumns - 1
words.Add(splitInput(CInt(row + numberOfRows * col)))
Next
Next
Dim sb As New System.Text.StringBuilder()
For i = 0 To words.Count() - 1
sb.Append(words(i).Replace("|", " "))
If (i <> words.Count() - 1) Then
sb.Append(If((i + 1) Mod numberOfColumns = 0, Environment.NewLine, vbTab))
End If
Next
Return sb.ToString()
End Function
Simply pass your ocr output string through it. Here it is called in your code
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim softwareBmp As Windows.Graphics.Imaging.SoftwareBitmap
Using bmp As Bitmap = New Bitmap(PictureBox1.Width, PictureBox1.Height)
Using g As Graphics = Graphics.FromImage(bmp)
Dim pt As Point = Me.PointToScreen(New Point(PictureBox1.Left, PictureBox1.Top))
g.CopyFromScreen(pt.X, pt.Y, 0, 0, bmp.Size, CopyPixelOperation.SourceCopy)
Using memStream = New Windows.Storage.Streams.InMemoryRandomAccessStream()
bmp.Save(memStream.AsStream(), System.Drawing.Imaging.ImageFormat.Bmp)
Dim decoder As Windows.Graphics.Imaging.BitmapDecoder = Await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(memStream)
softwareBmp = Await decoder.GetSoftwareBitmapAsync()
End Using
End Using
End Using
Dim ocrEng = OcrEngine.TryCreateFromLanguage(New Windows.Globalization.Language("en-US"))
Dim languages As IReadOnlyList(Of Windows.Globalization.Language) = ocrEng.AvailableRecognizerLanguages
For Each language In languages
Console.WriteLine(language.LanguageTag)
Next
Dim r = ocrEng.RecognizerLanguage
Dim n = ocrEng.MaxImageDimension
Dim ocrResult = Await ocrEng.RecognizeAsync(softwareBmp)
RichTextBox1.Text = transpose(ocrResult.Text)
End Sub
I tested with this function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim input = "0.00003599 0.00003599 104.1K KIBA 23.22M KIBA 3.74655900 USDT 835.89654200 USDT 0.0000% 0.0000%"
Dim output = transpose(input)
End Sub
Input:
0.00003599 0.00003599 104.1K KIBA 23.22M KIBA 3.74655900 USDT 835.89654200 USDT 0.0000% 0.0000%
Output:
0.00003599 104.1K KIBA 3.74655900 USDT 0.0000%
0.00003599 23.22M KIBA 835.89654200 USDT 0.0000%
Note you need to fix your string to temporarily replace any sentence with multiple words by replacing the space
with a pipe |
so they are not split, and if you encounter more examples of this you can continue adding Replace
according to the code. If the pipe turns out to be a valid character replace it with some other character you will never see.
Dim fixedInput = input.Replace(" KIBA", "|KIBA").Replace(" USDT", "|USDT")
...
sb.Append(words(i).Replace("|", " "))
Another solution, again working on the incorrect string by transposing, but this time the output will be a class which you can work with.
Make a class to represent your data
Public Class KibaClass
Public Property Price As Decimal
Public Property VolumeKIBA As Decimal
Public Property VolumeUSDT As Decimal
Public Property Percent As Decimal
End Class
And a different function to parse into this class
Private Function transposeToClass(input As String) As IEnumerable(Of KibaClass)
Dim numberOfColumns = 4
Dim fixedInput = input.Replace(" KIBA", "|KIBA").Replace(" USDT", "|USDT").Trim()
Dim splitInput = fixedInput.Split(" "c)
Dim numberOfWords = splitInput.Count()
Dim numberOfRows = numberOfWords / numberOfColumns ' 2
Dim words As New List(Of String)()
For row = 0 To numberOfRows - 1
For col = 0 To numberOfColumns - 1
words.Add(splitInput(CInt(row + numberOfRows * col)))
Next
Next
Dim kibas As New List(Of KibaClass)()
For row = 0 To numberOfRows - 1
Dim rowOffset = CInt(row * numberOfColumns)
Dim kiba = New KibaClass With {
.Percent = CDec(words(3 + rowOffset).Replace("%", "")) / 100,
.Price = CDec(words(0 + rowOffset))}
Dim multiplier As Double
Dim splitVolume = words(1 + rowOffset).Split("|"c)(0)
Dim lastChar = Convert.ToChar(splitVolume.Last())
Dim volume = splitVolume
If Not Char.IsDigit(lastChar) Then
volume = splitVolume.Substring(0, splitVolume.Length - 1)
Select Case lastChar.ToString().ToUpper()
Case "T"
multiplier = 1000000000.0
Case "M"
multiplier = 1000000.0
Case "K"
multiplier = 1000.0
Case Else
multiplier = 1.0
End Select
End If
kiba.VolumeKIBA = CDec(CDbl(volume) * multiplier)
splitVolume = words(2 + rowOffset).Split("|"c)(0)
lastChar = Convert.ToChar(splitVolume.Last())
volume = splitVolume
If Not Char.IsDigit(lastChar) Then
volume = splitVolume.Substring(0, splitVolume.Length - 1)
Select Case lastChar.ToString().ToUpper()
Case "T"
multiplier = 1000000000.0
Case "M"
multiplier = 1000000.0
Case "K"
multiplier = 1000.0
Case Else
multiplier = 1.0
End Select
End If
kiba.VolumeUSDT = CDec(CDbl(volume) * multiplier)
kibas.Add(kiba)
Next
Return kibas
End Function
Dim output1 = transposeToClass(input)
This holds an IEnumerable
of your class which you can enumerate into multiple instances of that object with properties in the proper format representing the columns you originally OCR'd.