Search code examples
phpxmlparsingsoapsimplexml

Parsing XML in PHP: String could not be parsed as XML


I want to parse XML returned from SOAP service into PHP arra. Here is the sample response:

https://gist.github.com/anonymous/0c83d7d8789f844575e3fd78434a970d

The content of url above is response in the code below:

    ...
    $client = new \SoapClient($wsUrl);
    $result = $client->__soapCall(
        "GetList",
        [],
        Null,
        $header
    );
    if (is_soap_fault($result)) {
        trigger_error("SOAP Fault: (faultcode: {$result->faultcode}, faultstring: {$result->faultstring})", E_USER_ERROR);
    } else {
        return $result;
    }
    $sxe = new \SimpleXMLElement($result);
    $sxe->registerXPathNamespace('d', 'urn:schemas-microsoft-com:xml-msdata');
    $result = $sxe->xpath("//NewDataSet");
    ...

Getting following error:

String could not be parsed as XML. SimpleXMLElement::__construct(): Entity: line 1: parser error : Extra content at the end of the document

What am I doing wrong?


Solution

  • Here is a re-formatted sample of the code linked in the question (note: it's best to include such an example directly, in case the external link becomes inaccessible).

    <xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet">
        <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="rows" msdata:UseCurrentLocale="true">
            <xs:complexType>
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                    <xs:element name="rows">
                        <xs:complexType>
                            <xs:sequence>
                                <xs:element name="id" type="xs:int" minOccurs="0"/>
                                <xs:element name="semt" type="xs:string" minOccurs="0"/>
                            </xs:sequence>
                        </xs:complexType>
                    </xs:element>
                </xs:choice>
            </xs:complexType>
        </xs:element>
    </xs:schema>
    <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
        <DocumentElement xmlns="">
            <rows diffgr:id="rows1" msdata:rowOrder="0">
                <id>1</id>
                <semt>_</semt>
            </rows>
            <!-- many more "rows" blocks similar to the above -->
        </DocumentElement>
    </diffgr:diffgram>
    

    Formatted like this, it's clear that there are two different root elements, <xs:schema>...</xs:schema> and <diffgr:diffgram>...</diffgr:diffgram>. A valid XML document must have a single root node, so this is the error the parser is detecting. (The "end of document" as far as it is concerned is "</xs:schema>", so the "extra content" is the entire block starting at "<diffgr:diffgram".)

    Looking at the two blocks, it's clear that they are actually intended as two different XML documents: one is a schema (a description of the expected format) and the other is the document itself.

    You could handle this in one of two ways:

    • split the string into two, e.g. by finding the first occurrence of "<diffgr" (this might break if the format of the XML changes slightly).
    • wrap the string in a fake extra element, e.g. $xml = "<dummy>$response</dummy>, so that the result is a valid XML document