Search code examples
phppdfpowerpointdocxdoc

Find out page numbers of PDF, Docx, Doc, Ppt, Pptx files with PHP


I want this functionality in my PHP application:

When user upload a document (PDF, DOCX, DOC, PPT, PPTC extensions) then after uploading user get the total number of pages of document.

But without using exec() function.


Solution

  • It is possible to do some formats right in PHP. The DOCx and PPTx are easy:

    For Word files:

    function PageCount_DOCX($file) {
        $pageCount = 0;
    
        $zip = new ZipArchive();
    
        if($zip->open($file) === true) {
            if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
                $data = $zip->getFromIndex($index);
                $zip->close();
                $xml = new SimpleXMLElement($data);
                $pageCount = $xml->Pages;
            }
            $zip->close();
        }
    
        return $pageCount;
    }
    

    and for PowerPoint

    function PageCount_PPTX($file) {
        $pageCount = 0;
    
        $zip = new ZipArchive();
    
        if($zip->open($file) === true) {
            if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
                $data = $zip->getFromIndex($index);
                $zip->close();
                $xml = new SimpleXMLElement($data);
                print_r($xml);
                $pageCount = $xml->Slides;
            }
            $zip->close();
        }
    
        return $pageCount;
    }
    

    Older Office documents are a different story. You'll find some discussion about doing that here: How to get the number of pages in a Word Document on linux?

    As for PDF files, I prefer to use FPDI, even though it requires a license to parse newer PDF file formats. You can use do it simply like this:

    function PageCount_PDF($file) {
        $pageCount = 0;
        if (file_exists($file)) {
            require_once('fpdf/fpdf.php');
            require_once('fpdi/fpdi.php');
            $pdf = new FPDI();                              // initiate FPDI
            $pageCount = $pdf->setSourceFile($file);        // get the page count
        }
        return $pageCount;
    }