Search code examples
phphtmlpdfhtml2pdf

HTML2PDF: Words being cut off/running off in PDF


Some of my words in my file are being cut off in the PPDF. I can see that the words are there, but not properly wrapping.

My output looks like:

enter image description here

Here is a snippit of my code:

            <table>
                <tr align=''>
                    <td colspan='5' class='heading'>Corporate URC Use Only</td>
                </tr>
                <tr>
                    <td>Consult Determination<span class='required'></span>:</td>
                    <td><strong>";
                    if(isset($updated_history) && !is_null($updated_history)){
                        $html .= $data['original_decision'];
                    }
                    else{
                        $html .= $data['final_decision'];
                    }

                    $html .="</strong></td>

                </tr>
                </table>
                <table>
                    <tr>
                        <td>Notes:</td>
                        <td><strong>" . $data['notes'] . "</strong></td>
                    </tr>
                </table>

My html2pdf implementation is pretty straight forward:

require_once("../include/html2pdf/html2pdf.class.php");
$html2pdf = new HTML2PDF('P','A4','en');
$html2pdf->pdf->SetDisplayMode('real');     

$html2pdf->WriteHTML($html);
$html2pdf->Output($c_file, "F");

EDIT: Here is a link to a sample pdf exhibiting this behavior. https://www.dropbox.com/s/h91g40bo4b2cmlw/Test_T_2312321.pdf?dl=0


Solution

  • It looks like...

    1. ...either your PDF's /MediaBox is more narrow than A4 (which is 595 points),
    2. ...or your PDF's drawing/writing of /Contents code does not respect the A4 width and draws/writes beyond it.

    You should check if your code utilizes a (probably hidden) setting that sets the page width (or complete page size) to letter (which would be 612x792 points).

    To test my assumption, you could replace the A4 in your html2pdf implementation by letter or Letter...


    (If you provide a [link to a] PDF created by your code it would be a lot easier to debug. What I wrote above is [almost] pure speculation.)


    Update

    After having looked at the source code of the PDF provided in the link of the update to the OP, I can say this:

    1. I unpacked the /Content streams of the file in order to see PDF page drawing operators as ASCII, using this command:

      qpdf --qdf --object-streams=disable document.pdf q.pdf
      
    2. Now the newly generated q.pdf is easily opened in a good text editor (like VIm, Emacs or Notepad++).

    3. The following line prints some text on the page:

       BT                     \
         /F2 10.00 Tf         \
       ET                     \
       [....]                 \
       q                      \
         0.000 0.000 0.000 rg \
           BT                 \
             0    Tr          \
             0.00 w           \
           ET                 \
           BT                 \
             50.00 359.19 Td  \
             [(Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore)] TJ \
           ET                 \
      Q
      

    This code snippets prints a very long line of text with a font internally named as /F2 (which in turn is mapped to /Helvetica-Bold elsewhere in the file), sized 10 points, starting at coordinates x=50, y=359.19.

    However, this long text line does not fit into the page's width given with /MediaBox, defined as [0 0 595 842] (which is in PostScript points, and represents A4).

    It would fit into a width of 635 (even leaving some small margin on the right edge).

    (You could also make the text fit to the current page width by down-sizing the text, eg. /F2 9.00 Tf. But this would still leave the long horizontal lines of your drawn boxes spill beyond the right page border...)

    The overall source code of this PDF is, BTW, in some places very inefficient (for example it contains BT /F1 10.00 Tf ET 1.000 g more than 1000 times, but this code does exactly... nothing! It only defines the font to be used as the internal name /F1 and the font size as 10 points).

    You can edit the original PDF with a text editor easily:

    1. Search for the string /MediaBox. It appears twice in the PDF, once for each page.

    2. Replace its current value of [0 0 595.28 841.89] by a new value of [0 0 635.00 841.89].

    3. Save the edited file.

    4. Open it in your favorite PDF viewer.

    Now you'll see that the page contents also fit into the page's width (which now is 635 points (== 22.4 cm; the original value of 595 points is equivalent to 21.0 cm).

    You'll also see that it is not only some long text lines which didn't fit into the page width -- the same is true for some of the horizontal lines drawn for the boxes.

    Here is a screenshot of the edited PDF file, showing how the new /MediaBox now is able to hold all the page content within its boundaries:

    so#29217840.png