I have many png files like this:
I want to slice the image into 48 (=6x8) small image files for the 48 cells separated by the table borders. That is, I would like to have files img11.png
, ..., img68.png
, where img11.png
contains the (1,1) "1.4x4x8" cell, img12.png
the (1,2) "M/T" cell, img13.png
the "550,000" cell, ..., img68.png
the bottom right "641,500" cell.
I want to do it because I thought it would improve the performance of tesseract
, which is not satisfactory because many of my image files have much poorer quality than shown above. Also, margins and sizes are diverse, and some images contain non-English characters and images.
Would there be software packages to detect the table borders and slice the image into m x n images? I am new in this area. I have read How to find table like structure in image but it's way beyond my ability. I am willing to learn, though.
Thanks for your help.
I'm using R. Bilal's suggestion (thanks) led me to the following.
Step 1: Convert the image to grayscale.
library(magick)
x <- image_read('https://i.sstatic.net/plBvs.png')
y <- image_convert(x, colorspace='Gray')
a <- as.integer(y[[1]])[,,1]
Step 2: Convert "dark" to 1 and "light" to 0.
w <- ifelse(a>190, 0, 1) # adjust 190
Step 3: Detect the horizontal and vertical lines.
ypos <- which(rowMeans(w) > .95) # adjust .95
xpos <- which(colMeans(w) > .95) # adjust .95
Step 4: Crop the original image (x
).
xpos <- c(0,xpos, ncol(a))
ypos <- c(0,ypos, nrow(a))
outdir <- "cropped"
dir.create(outdir)
m <- 0
for (i in 1:(length(ypos)-1)) {
dy <- ypos[i+1]-ypos[i]
n <- 0
if (dy < 16) next # skip if too short
m <- m+1
for (j in 1:(length(xpos)-1)) {
dx <- xpos[j+1]-xpos[j]
if (dx < 16) next # skip if too narrow
n <- n+1
geom <- sprintf("%dx%d+%d+%d", dx, dy, xpos[j], ypos[i])
# cat(sprintf('%2d %2d: %s\n', m, n, geom))
cropped <- image_crop(x, geom)
outfile <- file.path(outdir, sprintf('%02d_%02d.png', m, n))
image_write(cropped, outfile, format="png")
}
}