Search code examples
phpreplacehtml-parsinganchorhref

Replace the href value of qualifying <a>'s using the tag's visible text


I have a big string with a lot of URLs, I need to replace the URLs that match:

<a href="../plugins/re_records/somefile.php?page=something&id=X">important_name</a>

(where X is an any integer and important_name is any string) with:

<a href="/map/important_name">important_name</a>

I'm using preg_match_all() to match all URLs:

preg_match_all('/\/plugins\/re\_records\/somefile\.php\?page\=something\&id\=*(\d+)/', $bigString, $matches, PREG_OFFSET_CAPTURE);

The problem is that I don't understand how to get the important_name from the hyperlink's visible text to become part of the new url after the URL match.

Is it a good idea to use preg_match_all()?


Solution

  • Don't use regex. Use DOMDocument. They are specifically made to parse HTML/XML documents.

    Get all anchor tag elements, check for value in href attribute and change the attribute accordingly using setAttribute() method.

    Snippet:

    <?php
    
    libxml_use_internal_errors(true); // to disable warnings if HTML is not well formed 
    $o = new DOMDocument();
    $o->loadHTML('<a href="../plugins/re_records/somefile.php?page=something&id=45">important_name</a>');
    
    foreach($o->getElementsByTagName('a') as $anchor_tag){
        $href = $anchor_tag->getAttribute('href');
        if(strpos($href,'/plugins/re_records/somefile.php?page=something&id=') !== false){
            $anchor_tag->setAttribute('href','/map/'.$anchor_tag->nodeValue);
        }
    }
    
    echo $o->saveHTML();
    

    Demo: https://3v4l.org/5GPXA