I'm on a Linux server and I need to convert MS Word 97-2003 .doc format to plain text .txt files using PHP
I already tried this solutions:
How to extract text from word file .doc,docx,.xlsx,.pptx php
Extract text from doc and docx
But both are just working fine for .docx format.
The issue is when I convert files, I got scrap characters at the end of the text. The length of the chars I don't need vary depending on the length of the file. Also, it may happen that if the file is a bit long, it get truncated.
Is there any simple way to get this converted?
I've lastly come to use the following solution, launching Antiword:
private function doc() {
$file = escapeshellarg($this->filename);
$text = `/usr/sbin/antiword -w 0 $file`;
return html_entity_decode(utf8_encode(trim($text)));
}