Search code examples
phpms-worddoc

Convert doc to txt


I'm on a Linux server and I need to convert MS Word 97-2003 .doc format to plain text .txt files using PHP

I already tried this solutions:

How to extract text from word file .doc,docx,.xlsx,.pptx php

Extract text from doc and docx

But both are just working fine for .docx format.

The issue is when I convert files, I got scrap characters at the end of the text. The length of the chars I don't need vary depending on the length of the file. Also, it may happen that if the file is a bit long, it get truncated.

Is there any simple way to get this converted?


Solution

  • I've lastly come to use the following solution, launching Antiword:

    private function doc() {
        $file = escapeshellarg($this->filename);
        $text = `/usr/sbin/antiword -w 0 $file`;
        return html_entity_decode(utf8_encode(trim($text)));
    }