Search code examples
phpdomreplacehtml-parsing

Replace all title attributes in an html document


I have html code in a variable. For example $html equals:

<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>

I need to replace content of all title attributes title="Cool stuff" and title="anot stuff" and so on with title="$newTitle".

Is there any non-regex way to do this?

And if I have to use regex is there a better(performance-wise) and/or more elegant solution than what I came up with?

$html = '...'
$newTitle = 'My new title';

$matches = [];
preg_match_all(
    '/title=(\"|\')([^\"\']{1,})(\"|\')/',
    $html,
    $matches
);
$attributeTitleValues = $matches[2];

foreach ($attributeTitleValues as $title)
{
    $html = str_replace("title='{$title}'", "title='{$newTitle}'", $html);
    $html = str_replace("title=\"{$title}\"", "title=\"{$newTitle}\"", $html);
}

Solution

  • Definitely don't use regex -- it is a dirty rabbit hole.
    ...the hole is dirty, not the rabbit :)

    I prefer to use DomDocument and Xpath to directly target all title attributes of all element in your html document.

    • LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are in place to prevent your output being garnished with <doctype> and <html> tags.
    • // in the XPath expression says: go to any depth in search of matches

    Code: (Demo)

    $html = <<<HTML
    <div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
    HTML;
    $newTitle = 'My new title';
    
    $dom = new DOMDocument();
    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $xpath = new DOMXPath($dom);
    foreach ($xpath->query('//@title') as $attr) {
        $attr->value = $newTitle;
    }
    echo $dom->saveHTML();
    

    Output:

    <div title="My new title" alt="Cool stuff"><a title="My new title">......</a></div>