Search code examples
phppdfpdflib

PDFLib: Extract part of imported page


We have a document to be cutted in in its layout parts as singel pdf files: For example the headline -> headline.pdf, paragraphs -> paragraph01.pdf etc. To achieve that we use coordinates to know where those parts are placed. (The source document comes from an OCR tool, saving those coordinates)

Our problem is: the cutted pieces are simple copies from the original document, but with masked content, the document borders arranged to leave only the desired part visible. So the resulting documents are all of same filesize. How do we force PDFLib to cut the unwanted parts away? I hope there is a solution. We tried it with a lot of combinations of trimboxes, cropboxes and such, but with no result.

Here is the code we use:

$fWidth = 200;//width of document part
$fHeight = 20;//height of document part
$fMinXPoint = 10;//left coordinate x
$fMinYPoint = 10;//left coordinate y

$oPdf = new \PDFLib();
$oPdf->begin_document('', 'optimize=true linearize=true inmemory=true');
$oPdf->set_option('compress=9');
$oPdf->set_option('topdown=true');
$oLoadedDocument = $oPdf->open_pdi_document($sRealFilePath, '');// original pdf
$oPage           = $oPdf->open_pdi_page(
                    $oLoadedDocument,
                    1,
                    'clippingarea=crop'
);
$oPdf->begin_page_ext($fWidth, $fHeight, '');
$oPdf->fit_pdi_page($oPage, -$fMinXPoint, -$fMinYPoint, 'position={left top}'); 
$oPdf->end_page_ext("cropbox={0 0 $fWidth $fHeight}");
$oPdf->close_pdi_page( $oPage );
$oPdf->close_pdi_document( $oLoadedDocument );
$oPdf->end_document('');

Solution

  • How do we force PDFlib to cut the unwanted parts away?

    this is not possible with PDI. For PDI (the PDF Import extension of PDFlib) the imported page is a "black box", and the complete page content will be be copied to the output PDF. There is no option to manipulate the page content with PDFlib+PDI, which would be necessary to remove content from the page.