Search code examples

Parsing XML in PHP: String could not be parsed as XML

I want to parse XML returned from SOAP service into PHP arra. Here is the sample response:

The content of url above is response in the code below:

    $client = new \SoapClient($wsUrl);
    $result = $client->__soapCall(
    if (is_soap_fault($result)) {
        trigger_error("SOAP Fault: (faultcode: {$result->faultcode}, faultstring: {$result->faultstring})", E_USER_ERROR);
    } else {
        return $result;
    $sxe = new \SimpleXMLElement($result);
    $sxe->registerXPathNamespace('d', 'urn:schemas-microsoft-com:xml-msdata');
    $result = $sxe->xpath("//NewDataSet");

Getting following error:

String could not be parsed as XML. SimpleXMLElement::__construct(): Entity: line 1: parser error : Extra content at the end of the document

What am I doing wrong?


  • Here is a re-formatted sample of the code linked in the question (note: it's best to include such an example directly, in case the external link becomes inaccessible).

    <xs:schema xmlns="" xmlns:xs="" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet">
        <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="rows" msdata:UseCurrentLocale="true">
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                    <xs:element name="rows">
                                <xs:element name="id" type="xs:int" minOccurs="0"/>
                                <xs:element name="semt" type="xs:string" minOccurs="0"/>
    <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
        <DocumentElement xmlns="">
            <rows diffgr:id="rows1" msdata:rowOrder="0">
            <!-- many more "rows" blocks similar to the above -->

    Formatted like this, it's clear that there are two different root elements, <xs:schema>...</xs:schema> and <diffgr:diffgram>...</diffgr:diffgram>. A valid XML document must have a single root node, so this is the error the parser is detecting. (The "end of document" as far as it is concerned is "</xs:schema>", so the "extra content" is the entire block starting at "<diffgr:diffgram".)

    Looking at the two blocks, it's clear that they are actually intended as two different XML documents: one is a schema (a description of the expected format) and the other is the document itself.

    You could handle this in one of two ways:

    • split the string into two, e.g. by finding the first occurrence of "<diffgr" (this might break if the format of the XML changes slightly).
    • wrap the string in a fake extra element, e.g. $xml = "<dummy>$response</dummy>, so that the result is a valid XML document