I'm trying to scrape some data from PDF files. I'm using class.pdf2text.php (found here) for that (with some in-house adjustments), all works fine, but I have this very strange situation. If I run the code like this:
$a = new PDF2Text();
$a->setFilename('invoiceView2.pdf');
$a->decodePDF();
$pdftxt=$a->output();
preg_match("/Generated on.*/",$pdftxt,$bill_date);
var_dump($bill_date);
die();
nothing is returned $bill_date
is null. If I run the code like this:
$a = new PDF2Text();
$a->setFilename('invoiceView2.pdf');
$a->decodePDF();
$pdftxt=$a->output();
echo $pdftxt;
preg_match("/Generated on.*/",$pdftxt,$bill_date);
var_dump($bill_date);
die();
Then all the content of $pdftxt
is printed, and the $bill_date
is an array containing the result of the preg_match. As you can imagine I have no intention of outputting the whole content, I only need to get the preg_match result...
What I'm missing here?
Off.... the class class.pdf2text.php had an attempt to show progress of decoding the text which called a flush(); ob_flush();
which off course made the things go south... Always triple check the code of whatever class/addon you use if it is not your own.