Search code examples
phpgoogle-slides

Parsing PPTX in PHP, Need to find coordinates to attached images


I am trying to get the X,Y coordinates of images uploaded to the PPTX file uploaded using PHP. We are on a shared server, so limited by module usage.

There is no place where i could find anything. I tried searching the community, but unable to find a solution.

Any help would be appreciated.

function pptx_extract_images() {
    // Getting post data
    $data = filter_input_array(INPUT_POST);
    $attachment_id = $data['attachment_id'];

    // Getting attachment file and  its name
    $input_file = get_attached_file($attachment_id);
    $input_file_name = pathinfo($input_file)["filename"];

    // Getting attachment type
    $attachment_type = pathinfo($input_file, PATHINFO_EXTENSION);

    // Making a zip archive package for pptx
    $package = new \ZipArchive();

    //if not a zip/pptx file
    if (!isset($package)) {
        return;
    }

    $package->open($input_file);

    // Read relations and search for images
    $relationsXml = $package->getFromName('_rels/.rels');

    if ($relationsXml === false) {
        $logger->write_log('pptx_extract_images', $relationsXml, 'Invalid archive or corrupted .pptx file.');
        throw new RuntimeException('Invalid archive or corrupted .pptx file.');
    }

    $relations = simplexml_load_string($relationsXml);

    function absoluteZipPath($path) {
        $path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
        $parts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');

        $absolutes = array();

        foreach ($parts as $part) {
            if ('.' == $part) continue;

            if ('..' == $part) {
                array_pop($absolutes);
            } else {
                $absolutes[] = $part;
            }
        }

        return implode('/', $absolutes);
    }

    // Document data holders
    $slides = 0;
    $data = array();

    foreach ($relations->Relationship as $rel) {
        if ($rel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument') {

            $slideRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/_rels/" . basename($rel["Target"]) . ".rels")));

            foreach ($slideRelations->Relationship as $slideRel) {
                if ($slideRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide') {

                    $slideNotesRelations = simplexml_load_string($package->getFromName(absoluteZipPath(dirname($rel["Target"]) . "/" . dirname($slideRel["Target"]) . "/_rels/" . basename($slideRel["Target"]) . ".rels")));

                    $slideNo = isset($slideRel["Target"]) ? str_replace('slide', '', pathinfo($slideRel["Target"])["filename"]) : null;

                    foreach ($slideNotesRelations->Relationship as $slideImageRel) {
                        if ($slideImageRel["Type"] == 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/image') {

                            $image = basename($slideImageRel["Target"]);
                            $image_mime = explode('.', $image);

                            $count_explode = count($image_mime);
                            $image_mime = strtolower($image_mime[$count_explode - 1]);

                            if ($image_mime == 'gif') {
                                $data['page_' . ($slideNo - 1)]['animated_image'] = $attachment_id  . '-' .  $input_file_name . '/image-' . $image;
                            }
                        }
                    }

                    $slides++;
                }
            }
        }
    }

    $package->close();
}

Thanks for looking into it, much appreciated.


Solution

  • This can be achieved using the PHPPresentation library. I note you mentioned difficulty with having modules, but that should not be an issue here as you can download the releases and require them (no need for composer). While the PHPPresentation documentation is a bit lacking, their examples are really good to help figure out how things work.

    Here is what the pptx looks like:

    A series of images in a PowerPoint file

    The code:

    <?php
    
    /*
    
    Question Author: Ali Hussain
    Question Answerer: Jacob Mulquin
    Question: Parsing PPTX in PHP, Need to find coordinates to attached images
    URL: https://stackoverflow.com/questions/76454701/parsing-pptx-in-php-need-to-find-coordinates-to-attached-images
    Tags: php, google-slides
    
    */
    
    require_once 'PHPPresentation-1.0.0/src/PhpPresentation/Autoloader.php';
    \PhpOffice\PhpPresentation\Autoloader::register();
    require_once 'Common-1.0.1/src/Common/Autoloader.php';
    \PhpOffice\Common\Autoloader::register();
    
    $file = 'images.pptx';
    
    $pptReader = PhpOffice\PhpPresentation\IOFactory::createReader('PowerPoint2007');
    $oPHPPresentation = $pptReader->load($file);
    
    function getShapeDetails($shape, $slide_number)
    {
        $width = $shape->getWidth();
        $height = $shape->getHeight();
    
        if ($width === 0 || $height === 0) {
            return [];
        }
    
        $name = '';
        $description = '';
    
        if ($shape instanceof PhpOffice\PhpPresentation\Shape\Drawing\Gd) {
            $name = $shape->getName();
            $description = $shape->getDescription();
        }
    
        return [
            'slide_number' => $slide_number+1,
            'hashcode' => $shape->getHashCode(),
            'offsetX' => $shape->getOffsetX(),
            'offsetY' => $shape->getOffsetY(),
            'width' => $width,
            'height' => $height,
            'name' => $name,
            'description' => $description
        ];
    }
    
    $images = [];
    foreach ($oPHPPresentation->getAllSlides() as $slide_number => $oSlide) {
        foreach ($oSlide->getShapeCollection() as $oShape) {
            if ($oShape instanceof PhpOffice\PhpPresentation\Shape\Group) {
                foreach ($oShape->getShapeCollection() as $oShapeChild) {
                    $images[] = getShapeDetails($oShapeChild, $slide_number);
                }
            } else {
                $images[] = getShapeDetails($oShape, $slide_number);
            }
        }
    }
    
    // remove shapes that probably aren't images
    $images = array_values(array_filter($images));
    
    var_dump($images);
    

    Yields:

    array(4) {
      [0]=>
      array(8) {
        ["slide_number"]=>
        int(1)
        ["hashcode"]=>
        string(32) "e2b4ed359604645d2e483f037ef81b55"
        ["offsetX"]=>
        int(53)
        ["offsetY"]=>
        int(19)
        ["width"]=>
        int(468)
        ["height"]=>
        int(236)
        ["name"]=>
        string(9) "Picture 4"
        ["description"]=>
        string(89) "A picture containing smile, yellow, smiley, emoticon
    
    Description automatically generated"
      }
      [1]=>
      array(8) {
        ["slide_number"]=>
        int(1)
        ["hashcode"]=>
        string(32) "6ea656b2a8426a6d6f2393391056842f"
        ["offsetX"]=>
        int(925)
        ["offsetY"]=>
        int(148)
        ["width"]=>
        int(225)
        ["height"]=>
        int(225)
        ["name"]=>
        string(9) "Picture 6"
        ["description"]=>
        string(83) "A picture containing design, font, logo, white
    
    Description automatically generated"
      }
      [2]=>
      array(8) {
        ["slide_number"]=>
        int(2)
        ["hashcode"]=>
        string(32) "3ddc522c60417b61b8a5f316b0f29dc2"
        ["offsetX"]=>
        int(363)
        ["offsetY"]=>
        int(235)
        ["width"]=>
        int(402)
        ["height"]=>
        int(269)
        ["name"]=>
        string(21) "Content Placeholder 6"
        ["description"]=>
        string(102) "A group of men playing baseball in a field
    
    Description automatically generated with medium confidence"
      }
      [3]=>
      array(8) {
        ["slide_number"]=>
        int(3)
        ["hashcode"]=>
        string(32) "cbefccdfc4db5e355f96bdc8d294296e"
        ["offsetX"]=>
        int(365)
        ["offsetY"]=>
        int(245)
        ["width"]=>
        int(870)
        ["height"]=>
        int(457)
        ["name"]=>
        string(21) "Content Placeholder 4"
        ["description"]=>
        string(90) "A picture containing text, font, screenshot, graphics
    
    Description automatically generated"
      }
    }