My project is on Laravel Framework. My site is based on translation documents and books, etc. The costumer uploads his source file as pdf, at the backend, The words of pdf should be counted by an OCR to determining final price, so the count of words is very important. The main issue is that OCR's have problem with persian characters. How can you help me with this problem?
Follow my method and I hope you get the right answer you want:
Add PDFParser to your composer.json file and then composer update:
{
"require": {
"smalot/pdfparser": "*"
}
}
Use below code in your controller to get the count of your words:
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile("../public/1.pdf");
$text = $pdf->getText();
$text = trim( $text );
$text = str_replace( " ", "", $text );
echo str_word_count( $text );
Note: Put your PDF file in public folder for test.