FPDI Output File Size

I am using FPDF and FPDI to extract 2 pages from a pdf document that is generally about 28 pages long. The pdf files are basically a page with an image filling each page entirely and are around 35-40mb.

When using FPDI to extract the last 2 pages from the full document and create a new file, the file size of the new 2 page file remains almost the same. Any ideas why this might be?

Here is the basic code used to do the extracting:

public function extractPagesFromFile($file, $outputFileName, $numPages = 2) {
  $pageCount = $this->_fpdf->setSourceFile($file);
  if ($numPages < 0 || $numPages > $pageCount) {
    return false;
  }
  for ($pageNo = $pageCount - $numPages + 1; $pageNo < $pageCount + 1; $pageNo++) {
    $tplIdx = $this->_fpdf->ImportPage($pageNo);
    if (!isset($s)) {
      $s = $this->_fpdf->getTemplatesize($tplIdx);
    }
    $this->_fpdf->AddPage($s['w'] > $s['h'] ? 'L' : 'P', array($s['w'], $s['h']));
    $this->_fpdf->useTemplate($tplIdx);
  }

  $this->_fpdf->Output('F', $outputFileName);
  $this->_fpdf->cleanUp();
}

Solution

FPDI copies all resources of a page. I guess that all images in your file are located in a single resource dictionary. Because of this all of them will be copied. This is a common issue when extracting pages from existing PDF documents. Without parsing and interpreting the pages content stream it is impossible to know which resources should be copied or not. There's no solution with/for FPDI atm.

Anyhow we (Setasign) offer other non-free PHP components, such as the SetaPDF-Merger, that work on a lower level and for which we'd build a demo that fixes this behaviour.