I have written the below.Net code to read text from an image:
The platform used to write code: Windows 10,Visual Studio 2015,tesseract-ocr-setup-4.00.00dev and tessnet2
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using tessnet2;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var image = new Bitmap(@"D:\Python\download.jpg");
var ocr = new Tesseract();
ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR\tessdata", "eng",false);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
{
Console.WriteLine(word.Text);
File.AppendAllText(@"D:\Python\writefile.txt",word.Text);
}
Console.ReadLine();
}
}
}
I have both tried both CPU from "Any CPU" and X86. Tried changing the Target framework versions also from Project Properties.
However, I'm getting below error:
An unhandled exception of type 'System.IO.FileLoadException' occurred in
mscorlib.dll
Additional information: Mixed mode assembly is built against version
'v2.0.50727'
of the runtime and cannot be loaded in the 4.0 runtime without additional
configuration information.
Edit: Just written this in my app.config to remove the error and it is now looks like as below:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true">
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
</startup>
Installed the NuGet by referring this: https://www.nuget.org/packages/NuGet.Tessnet2/
I'm not able to read the image. The image I have downloaded from one of the Google Image which has text in it.
HEre is the message I'm getting:
and when I checked in the path C:\Program Files (x86)\Tesseract-OCR\tessdata
this is how it looks like:
What am I doing wrong? How to fix this?
The issue is resolved: by downloading the LANG packages from here: https://github.com/tesseract-ocr/langdata
Which was missing previously.The most important thing for Tessnet2 work is to get the languages packages, get it here (https://github.com/tesseract-ocr/langdata) for the languages you want. For the sample, I use the English language.
Download the language and extract that to "..\Tesseract-OCR\tessdata" folder.
Note: Looks like by default the language package will not come in tessdata during installation.
Here is my modified version of code :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using tessnet2;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var image = new Bitmap(@"D:\Python\download.jpg");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR\tessdata", "eng",false);
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
{
Console.WriteLine("{0} : {1}",word.Confidence,word.Text);
}
Console.Read();
}
}
}
Cheers!!!