python opencv ocr html2canvas google-vision

Extract Logo and Text from Visiting Card Image with coordinates

I have a visiting card. i want to get logo and all text from visiting card with coordinates. so i can make uploaded image editable on HTML Canvas. i have seen so many example but i couldn't find the exact i am looking for. i only found to get text from image. i tried with Google Vision API also but its also giving only text. I am new at python.

Here is a sample image.

In following code i have to select the logo to extract. i need it automatically find and extract.

# import the necessary packages
import argparse
import cv2

# initialize the list of reference points and boolean indicating
# whether cropping is being performed or not
ref_point = []
cropping = False

def shape_selection(event, x, y, flags, param):
  # grab references to the global variables
  global ref_point, cropping

  # if the left mouse button was clicked, record the starting
  # (x, y) coordinates and indicate that cropping is being
  # performed
  if event == cv2.EVENT_LBUTTONDOWN:
    ref_point = [(x, y)]
    cropping = True

  # check to see if the left mouse button was released
  elif event == cv2.EVENT_LBUTTONUP:
    # record the ending (x, y) coordinates and indicate that
    # the cropping operation is finished
    ref_point.append((x, y))
    cropping = False

    # draw a rectangle around the region of interest
    cv2.rectangle(image, ref_point[0], ref_point[1], (0, 255, 0), 2)
    cv2.imshow("image", image)

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image, clone it, and setup the mouse callback function
image = cv2.imread(args["image"])
clone = image.copy()
cv2.namedWindow("image")
cv2.setMouseCallback("image", shape_selection)

# keep looping until the 'q' key is pressed
while True:
  # display the image and wait for a keypress
  cv2.imshow("image", image)
  key = cv2.waitKey(1) & 0xFF

  # if the 'r' key is pressed, reset the cropping region
  if key == ord("r"):
    image = clone.copy()

  # if the 'c' key is pressed, break from the loop
  elif key == ord("c"):
    break

# if there are two reference points, then crop the region of interest
# from teh image and display it
if len(ref_point) == 2:
  crop_img = clone[ref_point[0][1]:ref_point[1][1], ref_point[0][0]:ref_point[1][0]]
  cv2.imshow("crop_img", crop_img)
  cv2.waitKey(0)

# close all open windows
cv2.destroyAllWindows()

Solution

You can give the ABBYY cloud API a try:

https://www.abbyy.com/en-gb/cloud-ocr-sdk/features/

The API will get you alle text with the coordinates and you can get back image elements - as far as detectable - as pure images too. With some logic you can put this together to have a documents which includes all text elements as real text and all images as images at the right positions.

But keep in mind that there is some preprocessin to the images before the OCR starts. That means that the quality of the images might have changed. Therefore it might be agood idea to extract the image-parts from the original scan by using the coordinates that you get from the API.

https://www.ocrsdk.com/documentation/specifications/export-formats/

The API is really good and gives you OCR results that are very comparable to google's cloud-vision. And you have more features and aparameters for tweaking the results. But the ABBYY API is much more expensive than the google API.