Search code examples
image-processingimagemagickocrimage-preprocessing

Programmatically divide scanned images into separate images


In order to improve OCR quality, I need to preprocess my scanned images. Sometimes I need to OCR the image with few pictures (components on the page and they are at different angles - for example, a few paper documents scanned at one time), for example:

enter image description here

Is it possible to automatically programmatically divide such images into separate images that will contain every logical document? For example with a tool like ImageMagick or something else? Is there any solutions/technics exists for such problem?


Solution

  • In ImageMagick 6, you can blur the image enough that the text overlaps and threshold so that the text boxes are each one large black region on a white background. Then you can use connected-components to find each separate black gray(0) region and its bounding box. Then crop the original image for each such region using the bounding box values.

    Input:

    enter image description here

    Unix Syntax (adjust the blur to be just large enough to keep the text regions solid black):

    infile="image.png"
    inname=`convert -ping $infile -format "%t" info:`
    OLDIFS=$IFS
    IFS=$'\n'
    arr=(`convert $infile -blur 0x5 -auto-level -threshold 99% -type bilevel +write tmp.png \
    -define connected-components:verbose=true \
    -connected-components 8 \
    null: | tail -n +2 | sed 's/^[ ]*//'`)
    num=${#arr[*]}
    IFS=$OLDIFS
    for ((i=0; i<num; i++)); do
    #echo "${arr[$i]}"
    color=`echo ${arr[$i]} | cut -d\  -f5`
    bbox=`echo ${arr[$i]} | cut -d\  -f2`
    echo "color=$color; bbox=$bbox"
    if [ "$color" = "gray(0)" ]; then
    convert $infile -crop $bbox +repage -fuzz 10% -trim +repage ${inname}_$i.png
    fi
    done
    


    Textual Listing:

    color=gray(255); bbox=892x1008+0+0
    color=gray(0); bbox=337x430+36+13
    color=gray(0); bbox=430x337+266+630
    color=gray(0); bbox=202x147+506+252
    

    tmp.png showing the blurred and thresholded regions:

    enter image description here

    Cropped Images:

    enter image description here

    enter image description here

    enter image description here