I need to extract the kilogram (kg) values displayed in the image below:
I manually cropped the image to isolate the text part and applied several image processing techniques such as grayscale conversion, thresholding, Gaussian blur, and dilation. However, the results were not as clear as I expected, and Tesseract OCR was unable to read them. Here are some of the processed images:
I am currently using EmguCV and Tesseract, and have tried various tesseract models including tessdata_best
(English), lets
, and letsgodigital
. Unfortunately, none of these attempts have been successful.
The specific language or library used is not crucial, as I plan to convert the solution to C#. The final implementation will be for a mobile app using Xamarin.Forms.
Below is a sample method that I used without success:
public static void Apply()
{
var folderName = "letsgodigital";
var dataname = "letsgodigital";
string tesseractPath = @$"./{folderName}";
string imagePath = @"img.jpg";
Mat image = CvInvoke.Imread(imagePath, ImreadModes.Color);
Mat blurredImg = new Mat();
CvInvoke.Blur(image, blurredImg, new Size(9, 9), new Point(-1, -1));
Mat grayImg = new Mat();
CvInvoke.CvtColor(blurredImg, grayImg, ColorConversion.Bgr2Gray);
Mat binaryImg = new Mat();
CvInvoke.Threshold(grayImg, binaryImg, 122, 255, ThresholdType.Binary);
binaryImg.Save("full_pannel_bw.png");
using (var engine = new TesseractEngine(tesseractPath, dataname, EngineMode.Default))
{
engine.DefaultPageSegMode = PageSegMode.SingleLine;
using (var img = Pix.LoadFromFile("full_pannel_bw.png"))
{
using (var page = engine.Process(img))
{
string text = page.GetText();
Console.WriteLine("tesseract got: \"{0}\"", text.Trim());
}
}
}
}
Edit my final process but tesseract cannot read that. I get empty text. Now I am trying to make this text on the image darker
static void loggg()
{
Mat img = CvInvoke.Imread("5.jpg", ImreadModes.Color);
VectorOfMat channels = new VectorOfMat();
CvInvoke.Split(img, channels);
Mat redChannel = new Mat();
CvInvoke.Subtract(channels[2], channels[1], redChannel);
CvInvoke.Subtract(redChannel, channels[0], redChannel);
CvInvoke.Threshold(redChannel, redChannel, 40, 255, ThresholdType.Binary);
Mat invertedRedChannel = new Mat();
CvInvoke.BitwiseNot(redChannel, invertedRedChannel);
Mat morphKernel = CvInvoke.GetStructuringElement(ElementShape.Rectangle, new Size(2, 2), new Point(-1, -1));
CvInvoke.MorphologyEx(invertedRedChannel, invertedRedChannel, MorphOp.Close, morphKernel, new Point(-1, -1), 1, BorderType.Constant, new MCvScalar(255));
Mat dilateKernel = CvInvoke.GetStructuringElement(ElementShape.Rectangle, new Size(1, 1), new Point(-1, -1));
CvInvoke.Dilate(invertedRedChannel, invertedRedChannel, dilateKernel, new Point(-1, -1), 1, BorderType.Constant, new MCvScalar(0));
invertedRedChannel.Save("darker_red_text.jpg");
img.Dispose();
redChannel.Dispose();
invertedRedChannel.Dispose();
channels.Dispose();
}
First some comments on the picture:
Recommendations:
The displayed text is bright and red. You could exploit both properties. So you would select the red channel of the picture and then threshold.
This is just the red channel:
I'll apply a "gamma" mapping, which is nonlinear. It's something one can try, and keep it if the results turn out better. If it were linear, it wouldn't do much of anything, to a threshold anyway (which comes later).
The panel's dark LEDs still look fairly light (level of ~0.25) but not as bright as before (~0.5). One could apply alternative or additional mappings to get the dark parts of the panel even darker.
This already constitutes a threshold of sorts... with manually picked values.
Now you can also see the LEDs and spaces between them in the letters. I'll just apply a lowpass to smooth that out. That'll help with the thresholding in that there won't be "noise" inside and outside of the letters from those "outliers".
For thresholding, it's usually a good idea to try automatic algorithms like Otsu. While figuring this out, Otsu often gave me thresholds that caused the letters to connect, so I worked with manually chosen thresholds most of the time. With the extra contrast stretching, which literally leaves only black between all the letters (see last pic), Otsu has no choice but to "work". This is again with a manually picked threshold.
I think that looks good enough even for simple old Tesseract OCR. If it needs inverting, just invert it.
Here is some Python, using OpenCV functions that should be equivalent even in third-party C# bindings.
I immediately convert to floating point. This prevents clipping or wraparound of the numbers if I exceed the "usual" value range (i.e. values can go below 0 and over 255/1.0). It's also just convenient for some of the math. imshow()
interprets floats as ranging from 0.0 to 1.0 but imwrite()
just converts to integer, so there you'd have to scale back.
im = cv.imread("QsvdNqcn.jpg")
# convert to float32 and scale to 0.0 .. 1.0
im = im * np.float32(1/255) # Mat::convertTo() with rtype=CV_32F and alpha=1.0/255.0
# getting a region, just for demonstration purposes
(x,y,w,h) = 763, 1281, 1167, 388
im = im[y:y+h, x:x+w] # Mat::operator()(cv::Rect)
(blue, green, red) = cv.split(im)
red_linear = red ** (1/0.45) # cv::pow()
# more contrast stretching to make "dark" parts darker
vmin, vmax = 0.7, 1.0
red_linear = (red_linear - vmin) / (vmax - vmin) # cv::Mat in C++ supports such expressions too
lowpassed = cv.GaussianBlur(red_linear, None, sigmaX=4.0)
(th, mask) = cv.threshold(lowpassed, 0.25, 1.0, cv.THRESH_BINARY)
# with Otsu, that'd take converting back to uint8 ranged 0..255
# (th, mask) = cv.threshold(np.clip(lowpassed * 255, 0, 255).astype(np.uint8), 128, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)