So i have this project in Python (Computer Vision), which is seperating text from figures of an image (like a paper news image).
My question is what's the best way to detect those figures in the paper ? (in Python).
Paper image example : Paper .
Haven't try anything. I have no idea ..
I found layout-parser python toolkit which is very helpful for your project.
Layout Parser is a unified toolkit for Deep Learning Based Document Image Analysis.
With the help of Deep Learning, layoutparser supports the analysis very complex documents and processing of the hierarchical structure in the layouts.
Check this complete notebook example on detecting newspaper layouts (separating images and text regions on the newspaper image)
it's recommended to use Jupyter notebook on Linux or macOS because layout-parser isn't supported on windows OS, or you can use Google Colab which I used for direct running of the toolkit.
pip install layoutparser # Install the base layoutparser library with
pip install "layoutparser[layoutmodels]" # Install DL layout model toolkit
pip install "layoutparser[ocr]" # Install OCR toolkit
Then installing the detectron2 model backend dependencies
pip install layoutparser torchvision && pip install "git+https://github.com/facebookresearch/[email protected]#egg=detectron2"
import layoutparser as lp
import cv2
# Convert the image from BGR (cv2 default loading style)
# to RGB
image = cv2.imread("test.jpg")
image = image[..., ::-1]
# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
model = lp.models.Detectron2LayoutModel('lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.7],
label_map={1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"})
# Detect the layout of the input image
layout = model.detect(image)
# Show the detected layout of the input image
lp.draw_box(image, layout, box_width=3)
From the result image you can see text layouts regions in orange box and image layouts regions (figure) in white box. It's amazing deep learning toolkit for image recognition.