Search code examples
phpxmlxmlreader

Read XML using XMLReader in PHP without know nodes


I have to read and parse an XML file using XMLReader with PHP without know the nodes.

I have this file:

<Invoices>
  <Company>
    <Name>Tuttobimbi Srl</Name>
  </Company>
  <Documents>
    <Document>
      <CustomerCode>0055</CustomerCode>
      <CustomerWebLogin></CustomerWebLogin>
      <CustomerName>Il Puffetto</CustomerName>
    </Document>
  </Documents>
</Invoices>

I would to parse it like this:

Invoices
Invoices, Company
Invoices, Company, Name
Invoices, Documents
Invoices, Documents, Document
etc...

I wrote this code:

    while ($xml->read()) {
        if ($xml->nodeType == XMLReader::ELEMENT)
            array_push($a, $xml->name);

        if ($xml->nodeType == XMLReader::END_ELEMENT)
            array_pop($a);

        if ($xml->nodeType == XMLReader::TEXT) {
            if (!in_array(implode(",", $a), $result)) {
                $result[] = implode(",", $a);
            }
        }
    }

It seems to work but doesn't print the nodes with subnodes, such as:

Invoices
Invoices, Company
Invoices, Documents
Invoices, Documents, Document

Solution

  • Many of those nodes you'd think would be XMLReader::TEXT nodes are actually XMLReader::SIGNIFICANT_WHITESPACE.

    Fortunately you can drop that $xml->nodeType == XMLReader::TEXT check altogether and build your result as you encounter elements.

    Example:

    while ($xml->read()) {
        if ($xml->nodeType == XMLReader::ELEMENT) {
            array_push($a, $xml->name);
            $result[] = implode(",", $a);
        }
    
        if ($xml->nodeType == XMLReader::END_ELEMENT) {
            array_pop($a);
        }
    }
    

    This'll give you:

    Array
    (
        [0] => Invoices
        [1] => Invoices,Company
        [2] => Invoices,Company,Name
        [3] => Invoices,Documents
        [4] => Invoices,Documents,Document
        [5] => Invoices,Documents,Document,CustomerCode
        [6] => Invoices,Documents,Document,CustomerWebLogin
        [7] => Invoices,Documents,Document,CustomerName
    )