Search code examples
phpxmlxpathsimplexml

Can not get HTML as it is inside XML node


I'm doing a script which gets a xml file and show some text in it. A sample xml structure could be like:

<documento fecha_actualizacion="20221027071750">
<metadatos>
[...]
</metadatos>
<analisis>
[...]
</analisis>
<texto>
<dl>
<dt>1. Poder adjudicador: </dt>
<dd>
[...]
</dd>
</dl>
</texto>
</documento>

I'm trying to get the html inside 'texto' element as a string ('<dl><dt>1. Poder ad[...]</dt></dd>[...]') , but when getting it, it is shown as:

Array ( [0] => SimpleXMLElement Object ( [dl] => SimpleXMLElement Object ( [dt] => Array ( [0] => 1. Poder adjudicador: [1] => 2. Tip

ordered by element (dl, dt, dd, etc). I've tried every posible solution for querying that 'texto' element (with '//texto/text()', innerhtml, node(), nodeValue(), etc.) but it always return me the same.

How could I get something like '<dl><dt>1. Poder ad[...]</dt></dd>[...]'

Thank you!!

I have tried with selectors:

$texto = $xml->xpath('//texto/text()');
$texto = $xml->xpath('//texto/innerXml()');
$texto = $xml->xpath('//texto/node()');
$texto = $xml->xpath('//texto/nodevalue()');

Solution

  • You need to fetch the parent nodes (texto), iterate and save each child node as XML:

    $documento = new SimpleXMLElement(getXMLstring());
    
    foreach ($documento->xpath('//texto') as $texto) {
      $result = '';
      foreach ($texto->children() as $content) {
        $result .= $content->asXML(); 
      }
    
      var_dump($result);
    }
    

    Output:

    string(59) "<dl>
    <dt>1. Poder adjudicador: </dt>
    <dd>
    [...]
    </dd>
    </dl>"
    

    SimpleXML is an abstraction focused on element nodes. It has limits. If the texto element can have non-element child nodes they will not be included. In this case you need to use DOM.

    $document = new DOMDocument();
    $document->loadXML(getXMLString());
    $xpath = new DOMXpath($document);
    
    foreach ($xpath->evaluate('//texto') as $texto) {
      $result = '';
      foreach ($texto->childNodes as $content) {
        $result .= $document->saveXML($content); 
      }
    
      var_dump($result);
    }
    

    Additionally DOMXpath::evaluate() supports full Xpath 1.0, including expressions that return scalar values.