Search code examples
javapdfbox

Customizing PDFTextStripper PDFbox


PDFTextStripper has a functionality to extract text from the whole document, is there a way to extract text only after a certain value when the value is recognized, for example :

A B C D G   1 line

A B C D G   2 line

A B C D G   3 line

QUANTITY  4 line

I would like to start to extract text after it finds Quantity(String) If anyone dealt with PDFBox and have some suggestion, it would be much appreciated

Or is it possible to add to the list only when it hits a line after a value that text will contain?


Solution

  • Easiest solution is to capture whole text and then create a Pattern that says -> "DESCRIPTION\\s*Reference\\s*QUANTITY(.*)" so basically i want to capture everything on a single page from the mentioned above

    1. create a function that would take String text as a parameter locate a single matcher.group(1), and return String or Optional<String>

    2. create a Pattern and tell that pattern with regex where would you like to start capturing from