I have the following image:
I want to keep only the black colored text 0790 and remove all from the picture. This stackoverflow question teaches to remove the color. However, I need to keep the color, not remove it.
You can simply use a threshold to filter out colored parts:
from PIL import Image
image = Image.open("xyz.jpeg")
image_data = image.load()
height, width = image.size
# Define a threshold for blackness
threshold = 60
for x in range(height):
for y in range(width):
r, g, b = image_data[x, y]
# Check if the pixel is close to black
if r < threshold and g < threshold and b < threshold:
image_data[x, y] = (0, 0, 0) # Keep it black
else:
image_data[x, y] = (255, 255, 255) # Change to white or any background color
After removing color:
from paddleocr import PaddleOCR
numpydata = np.asarray(image)
recognition_result = ocr_engine.ocr(numpydata, cls=True)
recognition_text = get_ocr_text(recognition_result)
You can also use PaddleOCR to extract text if you: recognition_result = 0790
# helper function
def get_ocr_text(recognition_result):
txts = []
for result in recognition_result:
if result is not None:
txts.extend(line[1][0] for line in result)
return " ".join(txts).strip()