I am trying to extract data from a specific rectangular region specified by two coordinates given inside a PDF. Is it possible to do this in a PDF or would I have to convert it into a image and use OCR? If so, does PDFBox or iText include a way to analyze images via OCR? Thanks!
If the area is text. use pdfbox,
PDDocument document = PDDocument.load(new File("target.pdf"));
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rectangle rect = new Rectangle(35, 375, 340, 204);
stripper.addRegion("class1", rect);
stripper.extractRegions(document.getPage(1));
System.out.println(stripper.getTextForRegion("class1"))