Search code examples
phpms-worddocx

Compress MS word docx documents using barcode fonts in PHP scipt


Using Tinybutstrong and openTBS i created a script in PHP that opens multiple docx templates and replaces a lot of variables with values from a database. In a nutshell clients can download their unique files, add information and pictures and upload them again. This works excellent. But of coarse i wouldn't post here if there wasn't some sort of problem.

Because of the barcodes (I am using barcode fonts and embed them in Word because the documents will be scanned far later in the process), the documents get huge. Instead of 100 KB average, they'll easily get 7MB. This is a problem, because per year about 20.000 documents will be scanned. That's an extra +/- 130 GB per year.

It's a long story but we need docx, so we can't simply replace it with some sort of PHP / MySQL template that would be far more efficient.

Word has the option to just embed the font symbols that are being used to cut on the size. But that isn't an option, because the main template needs to have all chars available. It's also not an option to send the font to the users, since there are +/- 20.000 new ones each year.

Is there another solution to cut the file size or use compression. Perhaps in Word, PHP, FTP, Apache?


Solution

  • I'm afraid the solution of using the option "Embed fonts in the file" with "Embed only characters used in the document" cannot be exploited. Ms Word saves the font using a special format with the extension ODTTF (for example, you have it in "word\fonts\font1.odttf"). But this format is binary, it seems badly documented and thus it stays as a proprietary format. Only Ms Word will be able to build such a sub-file.

    Since you haven't any lighter font for the barcode, the only solution I can see is to use image instead of font for you barcode:

    • OpenTBS has a feature to easily replace a picture inside a DOCX file (parameter "op=changepic").
    • Barcode2Image tools are easy to find in PHP. For example : Barcode Generator.

    Then you only have to code your process like this :

    1. Load the DOCX template,
    2. Create the temporary image of the barcode.
    3. Change the image inside the template.
    4. Merge the template, and save or send the result.
    5. Delete the temporary image.

    It's important to delete the temporary image only after the final merge of the template, because OpenTBS actually inserts the image only when method $tbs->Show() is called.

    It's also important to use a different temporary file for each merging because many merges can occur in the same time.

    If temporary files have a prefix or are saved into a dedicated directory, then it is advisable to clean up old temporary images regulary.