Search code examples
phphtmlpreg-matchdomdocument

Get DOM element string using PHP


I have a set of html strings that can look like this:

<div id="myelementID" class="hello" data-foo="bar"> ... </div>

or

<div id="myelementID" class="world" data-this="that"> ... </div>

etc etc, you get the idea. Except for id="myelementID", every other attribute else is not fixed.

What I need is to extract the exact string of the the <div>, eg. <div id="myelementID" class="hello" data-foo="bar"> if an element with the ID "myelementID" exists.

As of now, I'm able to use DomDocument to check if the element exists:

        $dom = new DomDocument;
        $dom->validateOnParse = true;
        $internalErrors = libxml_use_internal_errors(true);
        $dom->loadHTML($html_string);
        libxml_use_internal_errors($internalErrors);
        $el = $dom->getElementById("myelementID");

From here, how can I get the element's HTML string? I'm open to using preg_match as well, which may be an even better solution.

edit Just to be clearer, I'm not looking for the content of the element. I'm looking for the string <div id="myelementID" etc="etc" this="that">. Because it's not certain what attributes the element has apart from the fact that it's ID is "myelementID", that's why I'm having the problem.


Solution

  • Use DOMNode::C14N method to canonicalize nodes to a string, substr and strpos functions to get the needed fragment :

    ...
    $el = $dom->getElementById("myelementID");
    $elString = $el->C14N();
    
    var_dump(substr($elString, 0, strpos($elString, '>') + 1));
    

    The output (for your example):

    string(51) "<div class="hello" data-foo="bar" id="myelementID">"
    

    http://php.net/manual/ru/domnode.c14n.php