Search code examples
phppdfdocxphpwordfilemerge

Issue converting to .pdf a merged .docx file that opens fine in Word


So, I have the following scenario.

I am working on a system for academical papers. I have several inputs that are for stuff like author name, coauthors, title, type of paper, introduction, objectives and so on. I store all that information in a database. The user has a Preview button which when clicked, generates a Word asynchronously and sends the file location back to the user and that file is afterwards shown to the user in an iframe using Google Doc Viewer.

There's a specific use case where the user/author of the paper can attach a .docx file with a table, or a .jpeg file for a figure. That table/figure has to be included inside the final .docx file.

For the .docx generation process I am using PHPWord.

So up until this point everything works fine, but my issues start when I try to mix everything and put together the .docx file.

Approach Number One

My first approach on doing this was to do everything with PHPWord. I create the file, add the texts where required and in the case of the image just insert the image and after that the figure caption below the image.

Things get tricky though, when I try doing the same thing with the .docx table file. My only option was to get the table XML using this. It did the trick, but the problem I ran into was that when I opened the resulting Word file, the table was there, but had lost all of its styling and had transparent borders. Because of those transparent borders, afterwards when converting it to PDF the borders were ignored and the table info is just scrambled text.

Approach Number Two (current one)

After fighting with Approach Number One and just complicating stuff more, I decided to do something different. Since I already generated one docx file with the main paper information and I needed to add another docx file, I decided to use the DocX Merge Library.

So, what i basically did was I have three generated word files, one for the main paper information, one for the table and one for the table caption (that last one is mainly to not overcomplicated the order of information). Also, that data is not in the table .docx file.

Then I run this:

$dm->merge( [
    'paper-info.docx',
    'attached-table.docx',
    'attached-table-caption.docx'
], 'complete-file.docx');

So, afterwards, I check and the Word file is generated just as I need it with the table maintaining its original styles and dimensions.

If I open it in LibreOffice though, I get this error message:

LibreOffice Error Message

Then if I continue and open the file, the file opens correctly with all the data with the only exception that it no longer respects the fonts of the file as they appear in Word.

So, the problem comes in the next step. Since I need to present a preview of the file using Google Doc Viewer using this syntax:

<iframe src="https://docs.google.com/gview?embedded=true&hl=es_LA&url=https://usersite.net/complete-file.docx?pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="100%" height="600" style="border: none;"></iframe>

The document gets loaded fine, but when I review it what I see is that it only shows the content of the first paper-info.docx file and ends right where the table and table caption should appear. I open the exact same file in Word and it shows the table and caption.

The other issue is when I try to convert the file to PDF.

If I use PHPWord's method of conversion in combination with DomPDF I get the exact same issue as with the Google Docs Viewer, I just have the content of the first file, using this code:

$phpWordPDF = \PhpOffice\PhpWord\IOFactory::load('complete-file.docx');
$xmlWriterPDF = \PhpOffice\PhpWord\IOFactory::createWriter($phpWordPDF, 'PDF');
$xmlWriterPDF->save('complete-file-pdf');

So my only other viable route was to use LibreOffice's command line using this command:

soffice --headless --convert-to pdf complete-file.docx

This converts the file correctly, but has the issue mentioned when trying to open the .docx file in LibreOffice, the font styles are disconfigured.

Also weird part is that if I try to run this in my PHP script:

shell_exec('soffice --headless --convert-to pdf complete-file.docx');

Nothing happens.

I am running Apache 2.4.25, PHP 7.4.11 on Windows 10 x64.

Conclusion

Until now my best result was by merging the files, but it also caused this issue. So maybe the issue is coming from the merging process I am using. What would be ideal is to be able to just insert the table with styles and everything using PHPWord, but I haven't been able to and haven't found any examples on how to do that.

Another option that I've seen is this library, but the merge features is only in the license that's $599 USD, and since I am pretty close to solving this, I am not sure if it would solve my issue. If it does, I'd invest in it since I need to get this done ASAP, but I wanted to check with you guys what your recommendations would be for this case. Maybe another merging library or doing everything via PHPWord.

Help is appreciated!


Solution

  • After a lot of attempts to fix it, I wasn't able to achieve what I wanted with PHPWord and the merging library I mentioned.

    Since I needed to fix this I decided to invest in the paid library I mentioned in my question. It was an expensive purchase, but for those who are interested, it does exactly what was required and it does it perfectly.

    The two main functions I required were document merging and importing of content to a .docx file.

    So I had to purchase the Premium package. Once there, the library literally does everything for you.

    Example for docx files merge code:

    require_once 'classes/MultiMerge.php';
    
    $merge = new MultiMerge();
    
    $merge->mergeDocx('document.docx', array('second.docx', 'other.docx'), 'output.docx', array());
    

    Example for how to import a table from another docx file

    require_once 'classes/CreateDocx.php';
    
    $docx = new CreateDocxFromTemplate('document.docx');
    
    // import tables
    $referenceNode = array(
        'type' => 'table',
    );
    
    $docx->importContents('document_1.docx', $referenceNode);
    
    $docx->createDocx('output');
    

    As you can see it is pretty easy. This answer is by no means an ad for this library, but for those that have the same problem as me, this is a life saver.