Search code examples
phpxmlparsingsimplexmlxml-namespaces

PHP Parse XML response with many namespaces


Is there a way to parse through an XML response in PHP, taking into account all namespaced nodes and convert it to an object or array without knowing all the node names?

For example, converting this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service"
    xmlns:com="http://www.webex.com/schemas/2002/06/common"
    xmlns:att="http://www.webex.com/schemas/2002/06/service/attendee">
    <serv:header>
        <serv:response>
            <serv:result>SUCCESS</serv:result>
            <serv:gsbStatus>PRIMARY</serv:gsbStatus>
        </serv:response>
    </serv:header>
    <serv:body>
        <serv:bodyContent xsi:type="att:lstMeetingAttendeeResponse"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <att:attendee>
                <att:person>
                    <com:name>James Kirk</com:name>
                    <com:firstName>James</com:firstName>
                    <com:lastName>Kirk</com:lastName>
                    <com:address>
                        <com:addressType>PERSONAL</com:addressType>
                    </com:address>
                    <com:phones />
                    <com:email>[email protected]</com:email>
                    <com:type>VISITOR</com:type>
                </att:person>
                <att:contactID>28410622</att:contactID>
                <att:joinStatus>INVITE</att:joinStatus>
                <att:meetingKey>803754412</att:meetingKey>
            </att:attendee>
        </serv:bodyContent>
    </serv:body>
</serv:message>

to something like:

['message' => [
    'header' => [
        'response' => [
            'result' => 'SUCCESS',
            'gsbStatus' => 'PRIMARY'
        ]
    ],
    'body' => [
        'bodyContent' => [
            'attendee' => [
                'person' => [
                    'name' => 'James Kirk',
                    'firstName' => 'James',
                    ...
                ],
                'contactID' => 28410622,
                ...
            ]
        ]
    ]
]

I know it's easy with non-namespaced nodes, but I don't know where to begin on something like this.


Solution

  • (Read @ThW's answer about why an array is actually not that important to aim for)

    I know it's easy with non-namespaced nodes, but I don't know where to begin on something like this.

    It's as easy as with namespaced nodes because technically those are the same. Let's give a quick example, the following script loops over all elements in the document regardless of namespace:

    $result = $xml->xpath('//*');
    foreach ($result as $element) {
        $depth = count($element->xpath('./ancestor::*'));
        $indent = str_repeat('  ', $depth);
        printf("%s %s\n", $indent, $element->getName());
    }
    

    The output in your case is:

     message
       header
         response
           result
           gsbStatus
       body
         bodyContent
           attendee
             person
               name
               firstName
               lastName
               address
                 addressType
               phones
               email
               type
             contactID
             joinStatus
             meetingKey
    

    As you can see you can iterate over all elements as if they would not have any namespace at all.

    But as it has been outlined, when you ignore the namespace you'll also loose important information. For example with the document you have you're actually interested in the attendee and common elements, the service elements deal with the transport:

    $uriAtt = 'http://www.webex.com/schemas/2002/06/service/attendee';
    $xml->registerXPathNamespace('att', $uriAtt);
    
    $uriCom = 'http://www.webex.com/schemas/2002/06/common';
    $xml->registerXPathNamespace('com', $uriCom);
    
    $result = $xml->xpath('//att:*|//com:*');
    foreach ($result as $element) {
        $depth  = count($element->xpath("./ancestor::*[namespace-uri(.) = '$uriAtt' or namespace-uri(.) = '$uriCom']"));
        $indent = str_repeat('  ', $depth);
        printf("%s %s\n", $indent, $element->getName());
    }
    

    The exemplary output this time:

     attendee
       person
         name
         firstName
         lastName
         address
           addressType
         phones
         email
         type
       contactID
       joinStatus
       meetingKey
    

    So why drop all the namespaces? They help you to obtain the elements you're interested in. You can also do it dynamically