Search code examples
phpregexpreg-replace

PHP str_replace() and preg_replace() not working with HTML


When I try to do str_replace() or preg_replace() within the function, the content does not change.

Content in variable $sadrzaj:

$sadrzaj = '<p>asdasdasds</p><p><a href="http://www.example.com/wp-content/uploads/2018/11/image.jpg" itemprop="url" title="some title"><img alt="some alt title" class="alignnone size-full wp-image-243618" src="http://www.example.com/wp-content/uploads/2018/11/image.jpg" width="940" height="529"></a></p>asdasdasd<p>asdasd</p><h3>asdada</h3><p><a href="http://www.example.com/wp-content/uploads/2018/11/image_02.jpg" itemprop="url" title="some title 02"><img alt="some alt title 02" class="alignnone size-full wp-image-243653" src="http://www.example.com/wp-content/uploads/2018/11/image_02.jpg" width="940" height="529"></a></p><h3>asdasd</h3>';

My function to_je_to():

function to_je_to($content){
    preg_match_all('/<img (.*?)\/>/', $content, $images);
    //print_r($images);

    if(!is_null($images)){
        foreach($images[1] as $index => $value){
            if(strpos($images[1], 'size-full') !== false){
            //if(preg_match('/alt=""/', $value)){
                $new_img = preg_replace('<img', "<img data-example", $images[0][$index]);
                $content = preg_replace($images[0][$index], $new_img, $content);
            }
        }
    }
    echo $content; // return no difference
}

Calling the function to_je_to($sadrzaj); - nothing changes.

If there is class with "size-full", find this images and replace their tag with <img data-example ...>.

Even str_replace() or preg_replace() is not working.

What am I doing wrong?

Thanks


Solution

  • The main thing you're doing wrong is parsing HTML with a regular expression. You should use a proper DOM parser and then you can use XPath queries to isolate your desired elements.

    <?php
    $sadrzaj = '<p>asdasdasds</p><p><a href="http://www.example.com/wp-content/uploads/2018/11/image.jpg" itemprop="url" title="some title"><img alt="some alt title" class="alignnone size-full wp-image-243618" src="http://www.example.com/wp-content/uploads/2018/11/image.jpg" width="940" height="529"></a></p>asdasdasd<p>asdasd</p><h3>asdada</h3><p><a href="http://www.example.com/wp-content/uploads/2018/11/image_02.jpg" itemprop="url" title="some title 02"><img alt="some alt title 02" class="alignnone size-full wp-image-243653" src="http://www.example.com/wp-content/uploads/2018/11/image_02.jpg" width="940" height="529"></a></p><h3>asdasd</h3>';
    
    function to_je_to($content) {
        $dom = new DomDocument;
        $dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
        $xp = new DomXpath($dom);
        // this is complicated but is less fragile than just doing [contains(@class, 'size-full')]
        $nodes = $xp->query("//img[contains(concat(' ', normalize-space(@class), ' '), ' size-full ')]");
        foreach ($nodes as $img) {
            $img->setAttribute("data-example", "");
        }
        return $dom->saveHTML();
    }
    echo to_je_to($sadrzaj);
    

    And, a few comments on your original code:

    • $images will never be null, it will always be an array
    • it's unclear why you loop over $images[1] and then replace values based on $images[0]
    • there's no use of the group match (.*?) at all so the parentheses don't need to be there
    • neither of the preg_replace() calls in the loop use delimiters around the expression to search, so both would have failed with errors
    • and there is a very big difference between echo and return