I am using iText to parse text in a PDF document, and i am using PdfContentStreamProcessor
with a RenderListener
. Such as:
PdfReader reader = new PdfReader(file.toURI().toURL());
int numberOfPages = reader.getNumberOfPages();
MyRenderListener listener = new MyRenderListener ();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++) {
PdfDictionary pageDic = reader.getPageN(pageNumber);
PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
Rectangle pageSize = reader.getPageSize(pageNumber);
listener.startPage(pageNumber, pageSize);
processor.processContent(ContentByteUtils.getContentBytesForPage(reader, pageNumber), resourcesDic);
}
I have no problem to get the text with the renderText(TextRenderInfo)
method, but how do I parse the graphic content appart from images? For example in my case I would like to get:
Per mkl comment, by using ExtRenderListener
I am able to get the geometries. I used How to extract the color of a rectangle in a PDF, with iText for reference