Search code examples
c++msxml

MSXML's loadXML fails to load even well formed xml


I have written a wrapper on top of MSXML in c++ . The load method looks like as below. The problem with the code is it fails to load well formed xml sometimes.

Before passing the xml as string I do a string search for xmlns and replace all occurrence of xmlns with xmlns:dns. In the code below I remove bom character. Then i try to load using the MSXML loadXML method . If load succeeds I set the namespace as shown in the code.

 Class XmlDocument{

        MSXML2::IXMLDOMDocument2Ptr spXMLDOM;
         ....
    }

// XmlDocument methods

void XmlDocument::Initialize()
    {

    CoInitialize(NULL);
    HRESULT hr = spXMLDOM.CreateInstance(__uuidof(MSXML2::DOMDocument60));
    if ( FAILED(hr) ) 
    {

        throw "Unable to create MSXML:: DOMDocument object";
    }

}

bool XmlDocument::LoadXml(const char* xmltext)
    {

        if(spXMLDOM != NULL)
        {

            char BOM[3] = {0xEF,0xBB,0xBF};
            //detect unicode BOM character
            if(strncmp(xmltext,BOM,sizeof(BOM)) == 0)
            {
                xmltext += 3;
            }

            VARIANT_BOOL bSuccess = spXMLDOM->loadXML(A2BSTR(xmltext));
            if ( bSuccess == VARIANT_TRUE) 
            {
                spXMLDOM->setProperty("SelectionNamespaces","xmlns:dns=\"http://www.w3.org/2005/Atom\"");

                return true;
            }
        }
        return false;

    }

I tried to debug still could not figure why sometimes loadXML() fails to load even well formed xmls. What am I doing wrong in the code. Any help is greatly appreciated.

Thanks JeeZ


Solution

  • For this specific issue, please refer to Strings Passed to loadXML must be UTF-16 Encoded BSTRs.

    Overall, xml parser is not designed for in memory string parsing, e.g. loadXML does not recognize BOM, and it has restriction on the encoding. Rather, an xml parser is designed for byte array form with encoding detection, which is critical for a standard parser. To better leverage MSXML, please consider loading from IStream or a Win32 file.