Search code examples
phpxmlxmlreader

How to select all the names of elements and their quantities from a XML file?


I have a link to the XML file.

I get it this way:

$xml = simplexml_load_file('http://example.com');
SimpleXMLElement {#1631 ▼
  +"@attributes": array:1 [▶]
  +"shop": SimpleXMLElement {#1605 ▼
    +"name": "Site name"
    +"company": "Thomas Munz"
    +"url": "https://example.com"
    +"currencies": SimpleXMLElement {#1606 ▶}
    +"categories": SimpleXMLElement {#1607 ▶}
    +"offers": SimpleXMLElement {#1608 ▼
      +"offer": array:24403 [▶]
    }
  }
}

These XML files will change, as will the values in them. I cannot know the names of all elements in advance. So I want to get an array of all the elements in this XML file with their count. That is:

$array = [
   'attributes' => '1',
   'shop' => '1',
   'name' => '1',
   ...
   'offer' => '24403',
]

I really have no idea how to do this :c


Solution

  • Answering the question directly as it is currently stated:

    So I want to get an array of all the elements in this XML file with their count.

    To count all elements regardless of their nesting level, fetch all elements using the XPath expression //*, then count them using a map, e.g.:

    $xml = simplexml_load_file('file.xml');
    
    $elements = [];
    foreach ($xml->xpath('//*') as $element) {
        $name = $element->getName();
        $elements[$name] = ($elements[$name] ?? 0) + 1;
    }
    
    print_r($elements);
    

    For the following XML:

    <?xml version="1.0"?>
    <shop>
        <name>Shop name</name>
        <company>Company name</company>
        <url>https://example.com/shop</url>
        <url>https://example.com/shop2</url>
        <url>https://example.com/shop3</url>
        <currencies>
            <currency>
                <name>XXX</name>
            </currency>
            <currency>
                <name>YYY</name>
            </currency>
            <currency>
                <name>ZZZ</name>
            </currency>
        </currencies>
    </shop>
    

    The PHP code above prints:

    Array
    (
        [shop] => 1
        [name] => 4
        [company] => 1
        [url] => 3
        [currencies] => 1
        [currency] => 3
    )
    

    However, if nesting is important (e.g. the names of the tags may be equal at different levels) or a more sophisticated logic is needed, I'd recommend a recursive function similar to the following:

    /**
     * @param array<string, int> $elements
     */
    function countElements(SimpleXMLElement $root, array &$elements): void
    {
        $elements[$root->getName()] = ($elements[$root->getName()] ?? 0) + 1;
    
        foreach ($root as $element) {
            $name = $element->getName();
    
            switch ($name) {
                case 'shop': // no break
                case 'currencies':
                    countElements($element, $elements);
                    break;
                case 'currency': // no break
                case 'company': // no break
                case 'name': // no break
                case 'url': // no break
                    $elements[$name] = ($elements[$name] ?? 0) + 1;
                    break;
            }
        }
    }
    

    Another example of a recursive function:

    /**
     * Builds an array of element counters and paths.
     *
     * @example
     * <pre>
     *  Array
     *  (
     *      [shop] => Array
     *          (
     *              [count] => 1
     *              [path] => /shop
     *              [name] => Array
     *                  (
     *                      [count] => 1
     *                      [path] => /shop/name
     *                  )
     *  )
     * </pre>
     *
     * @param array<string, array{count: int, path: string}> $elements
     */
    function countElements(SimpleXMLElement $root, array &$elements, string $path = '/'): void
    {
        $children = $root->children();
        $name = $root->getName();
        $newPath = rtrim($path, '/') . "/$name";
    
        $elements[$name] = [
            'count' => ($elements[$name]['count'] ?? 0) + 1,
            'path' => $newPath,
        ];
        if (!$children) {
            return;
        }
        $elements[$name] ??= [
            'count' => 1,
            'path' => $newPath,
        ];
    
        foreach ($children as $child) {
            countElements($child, $elements[$name], $newPath);
        }
    }