Search code examples
phplinuxbashdocxlibreoffice

check valid docx from linux command line


I generate docx files in a php script, but sometimes they are corrupted. This is not known by the server and it returns the docx file to the user and he discovers that it's is corrupted, creating a very bad experience.

Does someone have a solution to check in linux cli if the docx is corrupted? So I could be more resilient, trying to fix it or give a proper response to the user.

By now I'm experimenting with:

libreoffice --headless --convert-to html corrupted.docx 

But if the file is not corrupted, most of cases, it will increase the response time.

you can debug with this corrupted file


Solution

  • You could call a PHP script opening the doc with PHPWord which can report on success for failure. See this example:

    include_once 'Sample_Header.php';
    
    // Read contents
    $name = basename(__FILE__, '.php');
    $source = __DIR__ . "/resources/{$name}.docx";
    echo date('H:i:s'), " Reading contents from `{$source}`", EOL;
    $phpWord = \PhpOffice\PhpWord\IOFactory::load($source);
    
    return $phpWord instanceof PhpOffice\PhpWord\PhpWord;