Search code examples
phppdftextunicode

How to extract text from the PDF document?


How to extract text from the PDF document using PHP?

(I can't use other tools, I don't have root access)

I've found some functions working for plain text, but they don't handle well Unicode characters:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html


Solution

  • Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a or https://webcheatsheet.com/php/scripts/pdf2text.zip

    Code:

    include('class.pdf2text.php');
    $a = new PDF2Text();
    $a->setFilename('filename.pdf'); 
    $a->decodePDF();
    echo $a->output(); 
    

    • class.pdf2text.php Project Home
    • pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser