Search code examples
pythonimage-processingcomputer-visionobject-detectiondocument-layout-analysis

How to detect figures in a paper news image in Python?


So i have this project in Python (Computer Vision), which is seperating text from figures of an image (like a paper news image).

My question is what's the best way to detect those figures in the paper ? (in Python).

Paper image example : Paper .

Haven't try anything. I have no idea ..


Solution

  • I found layout-parser python toolkit which is very helpful for your project.

    Layout Parser is a unified toolkit for Deep Learning Based Document Image Analysis.

    With the help of Deep Learning, layoutparser supports the analysis very complex documents and processing of the hierarchical structure in the layouts.

    Check this complete notebook example on detecting newspaper layouts (separating images and text regions on the newspaper image)

    it's recommended to use Jupyter notebook on Linux or macOS because layout-parser isn't supported on windows OS, or you can use Google Colab which I used for direct running of the toolkit.

    Requirements for installing the toolkit

    pip install layoutparser # Install the base layoutparser library with  
    pip install "layoutparser[layoutmodels]" # Install DL layout model toolkit 
    pip install "layoutparser[ocr]" # Install OCR toolkit
    

    Then installing the detectron2 model backend dependencies

    pip install layoutparser torchvision && pip install "git+https://github.com/facebookresearch/[email protected]#egg=detectron2"    
    

    Running the toolkit on newspaper image

    import layoutparser as lp
    import cv2
    
    # Convert the image from BGR (cv2 default loading style)
    # to RGB
    image = cv2.imread("test.jpg")
    image = image[..., ::-1] 
    
    # Load the deep layout model from the layoutparser API 
    # For all the supported model, please check the Model 
    # Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html       
    model = lp.models.Detectron2LayoutModel('lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config', 
                                     extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.7],
                                     label_map={1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"})
        
    # Detect the layout of the input image
    layout = model.detect(image)
       
    # Show the detected layout of the input image
    lp.draw_box(image, layout, box_width=3)
        
    

    newspaper layouts detection

    From the result image you can see text layouts regions in orange box and image layouts regions (figure) in white box. It's amazing deep learning toolkit for image recognition.