I'm trying to use tess4j to scan multipage PDF files. I use the following code:
PdfUtilities.splitPdf(imageFile, outputFile, startPage, endPage);
List<IIOImage> imageList = ImageIOHelper.getIIOImageList(outputFile);
String result = instance.doOCR(imageList, null);
However, due to speed issues, I am only interested in scanning the top half (actually, even less, but for argument's sake) of each page. The API specifies that where I am currently passing null
I can pass Rectangle rect
, but I have seen no reference to what the coordinates of the rectangle refer to. The PDFs come from different providers if that makes any difference.
It specifies a region within an image's boundary, with (0,0) at the top left corner of the image.