Search code examples
phphtmldomdocumentsimple-html-domphp-parser

Move img before parent paragraph using simple-html-dom


Kay I got really stuck on this one :/ Tried SimpleHTMLDom as mentioned in the title and DOMDocument so far.. The $html will come from CKEditor in my Processwire driven page, I made a textformatter to auto post-process the output.

So this is the test data

<?php
$html = <<<_DATA
    <p><img src="http://placehold.it/100x100"><img src="http://placehold.it/130x100">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam</p>
_DATA;

So here's my SimpleHTMLDom try

<?php
$dom = str_get_html($html);
$imgs = $dom->find('img');

foreach ($imgs as $img) {
    $i = $img->outertext;
    $img->outertext = '';
    $img->parent()->outertext = $i . $img->parent()->outertext;
}
echo $dom->save();
$dom->clear();

Having only one img in the $html above and everything works as expected, but those two (or more) return duplicates.

  1. issue, it changes the sort order, so the 130x100 image will be first. I know I'm prepending, but I don't know how to change it. Tried to stuff all images in a variable so they stay in order, but then I don't know how to prepend it to the paragraph..

  2. and actually more important issue is about the duplicates, strange thing is, it prepends all images properly but it's only deleting the first img within the paragraph and that's true for any additional image, so with 3 it would keep the last two (as I said, 1 will work fine)

What am I doing wrong?

This would probably be better in a separate question, but I wanted to show that I also tried DOMDocument but couldn't get insertBefore to work (at all) I tried different variations (uncommented in below code)

<?php
include_once "./classes/SmartDOMDocument.class.php";
$dom = new SmartDOMDocument();
$dom->loadHTML($html);

$imgs = $dom->getElementsByTagName('img');

foreach ($imgs as $img) {
    $i = $dom->createElement('img');
    $i->src = $img->getAttribute('src');
    $img->parentNode->insertBefore($i, $img->parentNode);
    // $img->insertBefore($i, $img->parentNode);
    // $dom->insertBefore($i, $img->parentNode);
    $img->parentNode->removeChild($img);
}

echo $dom->saveHTMLExact();

If something is not well enough documented or asked please feel free to comment and I'll try to explain better :)

Edit: The html (coming from wysiwyg as mentioned above) will sometimes hold images in the middle or end of a paragraph, might contain a single or multiple images (undefined number) and there will be more than one paragraph in that html

EDIT: Should've included how I want the output to be

So this is the input

<p>
    <img src="http://placehold.it/100x100">
    <img src="http://placehold.it/130x100">
    <img src="http://placehold.it/160x100">
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
</p>

And this should be the result

<div class="inlineGallery">
    <figure><img src="http://placehold.it/100x100"></figure>
    <figure><img src="http://placehold.it/130x100"></figure>
    <figure><img src="http://placehold.it/160x100"></figure>
</div>
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
</p>

Sorry for not mentioning that those images should be wrapped in figures and then in a container..a single image wouldn't need an extra container but that's actually not important.. I tested with a full code..wrapping images in figure, adding figcaption where applicable and wrapped multiple figure in a div, everything work on an article with only single images, then I came across some html similar to the test data above on another article which results in the mentioned duplication..So I stripped down the code to see where the problem comes from with no luck..that's why I just added this simplyfied code in the question because I thought once this works the other one will work, too ;-)

Hope it's more clear now?!


Solution

  • So here is the basic code which does the job as questioned

    // turn double linebreaks into paragraphs <br><br> to </p><p>
    $value = preg_replace('#(?:<br\s*/?>\s*?){2,}#', '</p><p>', $value);
    
    $dom = str_get_html($value);
    
    /* first getting all <p> */
    $paragraphs = $dom->find('p');
    
    foreach ($paragraphs as $p) {
        $imgs = $p->find('img');
    
        /* init gallery container */
        $gallery = "<div class='gallery'>";
        foreach  ($imgs as $img) {
            /* get the current image */
            $i = $img->outertext;
            /* wrap in link */
            $i = "<a href='Link'>$i</a>";
            /* append to gallery */
            $gallery .= $i;
            /* remove original image from paragraph */
            $img->outertext = '';
        }
        /* close new gallery */
        $gallery .= "</div>";
        /* remove unnecessary <br> */
        $newParagraph = trim(preg_replace( '#^\s*(?:<br\s?\/?>)*\s*|(?:<br\s?\/?>)*\s*$#', '', trim($p->innertext)));
        /* wrap tidied text into <p> */
        $newParagraph = "<p>$newParagraph</p>";
        /* replace old paragraph by gallery and new paragraph */
        $p->outertext = $gallery . $newParagraph;
    }
    // save dom to $value
    $value = $dom->save();
    // clear dom
    $dom->clear();
    

    But who's interested in the full plan I'm using this for should have a look at the Processwire Forums https://processwire.com/talk/topic/13471-better-ckeditor-image-insertion-at-least-for-me/