Search code examples
c++comvisual-studio-2013msxml

#import directive created COM wrapper classes replaces wchar_t with unsigned short


We have some legacy code that uses MSXML and the wrapper classes generated using Visual Studio's C++ #import directive like so:

#import <msxml6.dll> named_guids

We are upgrading the project to use wchar_t as a built-in type (previously, the /Zc:wchar_t- flag was set, so wchar_t was unsigned short). This seems to cause problems as the type library headers generated using #import replace const wchar_t* input parameters with unsigned short*.

For example the ISAXXMLReader::putProperty method has the following signature:

HRESULT putProperty(
    [in] const wchar_t * pwchName,
    [in] VARIANT varValue);

but the generated type library header uses the following signature:

HRESULT ISAXXMLReader::putProperty ( 
    unsigned short * pwchName, 
    const _variant_t & varValue )

so not only is wchar_t converted to unsigned short, but the const is stripped. So the code fails to compile without an unsightly cast:

MSXML2::ISAXXMLReaderPtr saxReader(__uuidof(MSXML2::SAXXMLReader60));
MSXML2::IMXWriterPtr xmlWriter(__uuidof(MSXML2::MXXMLWriter60));

//Set properties on the XML writer.
// Omitted for brevity

saxReader->putProperty(L"http://xml.org/sax/properties/lexical-handler", // Can't convert to unsigned short*
            (_variant_t)xmlWriter.GetInterfacePtr());

Is there any way to get the import directive to generate the proper function signatures in the wrapper classes?

Edit To add to the muddle the msxml6.h header declares a C++ class ISAXXMLReader with the expected signature:

    virtual HRESULT STDMETHODCALLTYPE putProperty( 
        /* [in] */ const wchar_t *pwchName,
        /* [in] */ VARIANT varValue) = 0;

though after reading the answer provided, I guess it's just hiding the gory details. But at least it's consistent with the documentation (which uses this header in its samples.)


Solution

  • Chris' comment has a good link which describes the problem pretty cleanly. To summarize:

    The problem is that the signature of that argument really is unsigned short * and not const wchar_t*, despite MSDN's wishful thinking to the contrary.

    In a way, the signature in MSDN describes the moral intent of the parameter, not its actual signature.

    The ultimate authority on what is the signature is the MSXML6 type library itself. As the link in Chris' comment describes, there is no way to indicate in a type library that an argument is a "pointer to a wide character" because automation doesn't support such a thing. So, they use the closest thing that is ABI-compatible, and that's a unsigned short *.

    The #import compiler extension can only reflect what's in the type library. There is no way to tell it to selectively "lie" in the output.

    Here's the actual signature of that method, taken straight from the type library (via oleview.exe):

    HRESULT _stdcall putProperty(
                      [in] unsigned short* pwchName, 
                      [out, retval] VARIANT* pvarValue);
    

    (there is a sleigh of hands in me using oleview. After all, you're looking at the output of a code generator just like with #import, so it doesn't quite prove anything new. However, this is the best we can do without using the type library API to look at the type library ourselves).

    This kind of things is just the price you have pay for making your COM object available to automation clients.

    ADDENDUM:

    If you look at the interface, you have to wonder how the heck you can possibly call that from VB6 or VBScript. Well. you can't.

    The SAXXMLReader coclass implements two nearly-twin interfaces with the same semantics: ISAXXMLReader is the interface we're looking at, and it's a non-remotable, non-automation, C++-optimized version of the interface. What you get when you use a SAXXMLReader object from VB6 is its [default] interface IVBSAXXMLReader. This is an IDispatch-inheriting automation-compatible interface, but it has the same semantics as ISAXXMLReader. To wit: IVBSAXXMLReader's putProperty takes a BSTR instead of an unsigned short *.

    The MSDN documentation for many classes tends to muddle the distinction between how an object is called from C++ and VB/VBScript. They make it look like you are calling the same thing when often that's not the case, and they hide the interface details under the rug. I would prefer if they were a bit more explicit. I guess they have to the balance documenting the semantics of a library, and having to cater to both native and scripting developers, who might have vastly different levels of expertise on COM's plumbing.