Search code examples
phpparsingxpathdomxpath

How to extract mixed content in PHP DOMXpath?


I have following HTML which I am parsing:

<ul class="man">
   <li>
      height
       <span>3.3"</span>
    </li>
    <li>
       weight
       <span>45kg</span>
    </li>
    <li>
       date born
       <span>1/12/1979</span>
    </li>

 </ul>

I am using below code to parse the above HTML:

foreach($xpath->query("//ul[@class='man']/li") as $element)
{
       echo $element->nodeValue;

}

But the code returns whole thing which is inside <li></li> as height 3.3" and weight 45kg while I need both of that separate as height as label and 3.3" as value same way weight as label and 45kg as value.

I can achieve second that is value using "//ul[@class='man']/ul/span" but can't get label in separate variable.

Any idea to solve this problem?

P.S: There is no way to change the label value as they are coming from server in HTML page.


Solution

  • You can iterate over the <li>'s children, the first of which is a DOMText object containing (for example) the height label, and the second will be the <span> DOMElement object:

    $data = array();
    
    foreach ($xpath->query("//ul[@class='man']/li") as $element) {
    
        foreach ($element->childNodes as $child) {
            $content = trim($child->nodeValue);
    
            if ($child instanceof DOMText && $content != '') {
                $key = $content;
            } elseif ($child instanceof DOMElement && $child->tagName == 'span') {
                $value = $content;
            }
        }
    
        if ($key !== null && $value !== null) {
            $data[$key] = $value;
        }
    }