Search code examples
image-processingpng

How to slice an image by table border


I have many png files like this:image table

I want to slice the image into 48 (=6x8) small image files for the 48 cells separated by the table borders. That is, I would like to have files img11.png, ..., img68.png, where img11.png contains the (1,1) "1.4x4x8" cell, img12.png the (1,2) "M/T" cell, img13.png the "550,000" cell, ..., img68.png the bottom right "641,500" cell.

I want to do it because I thought it would improve the performance of tesseract, which is not satisfactory because many of my image files have much poorer quality than shown above. Also, margins and sizes are diverse, and some images contain non-English characters and images.

Would there be software packages to detect the table borders and slice the image into m x n images? I am new in this area. I have read How to find table like structure in image but it's way beyond my ability. I am willing to learn, though.

Thanks for your help.


Solution

  • I'm using R. Bilal's suggestion (thanks) led me to the following.

    Step 1: Convert the image to grayscale.

    library(magick)
    x <- image_read('https://i.sstatic.net/plBvs.png')
    y <- image_convert(x, colorspace='Gray')
    a <- as.integer(y[[1]])[,,1]
    

    Step 2: Convert "dark" to 1 and "light" to 0.

    w <- ifelse(a>190, 0, 1)         # adjust 190
    

    Step 3: Detect the horizontal and vertical lines.

    ypos <- which(rowMeans(w) > .95)  # adjust .95
    xpos <- which(colMeans(w) > .95)  # adjust .95
    

    Step 4: Crop the original image (x).

    xpos <- c(0,xpos, ncol(a))
    ypos <- c(0,ypos, nrow(a))
    
    outdir <- "cropped"
    dir.create(outdir)
    m <- 0
    for (i in 1:(length(ypos)-1)) {
      dy <- ypos[i+1]-ypos[i]
      n <- 0
      if (dy < 16) next  # skip if too short
      m <- m+1
      for (j in 1:(length(xpos)-1)) {
        dx <- xpos[j+1]-xpos[j]
        if (dx < 16) next  # skip if too narrow
        n <- n+1
        geom <- sprintf("%dx%d+%d+%d", dx, dy, xpos[j], ypos[i])
        # cat(sprintf('%2d %2d: %s\n', m, n, geom))
        cropped <- image_crop(x, geom)
        outfile <- file.path(outdir, sprintf('%02d_%02d.png', m, n))
        image_write(cropped, outfile, format="png")
      }
    }
    

    The cropped (1,1) image is image22.