Search code examples
c++gsoap

GSOAP malforms utf-8 in std::string


I have C++ server using GSOAP. One of APIs accepts a string.

<message name="concatRequest">
  <part name="a" type="ns:password"/><!-- ns__concat::a -->
  <part name="b" type="xsd:string"/><!-- ns__concat::b -->
</message>

int billon__concat( struct soap *soap, std::string a, std::string b, std::string &result )
{
//    std::cout <<"PACZPAN A:"<<a<<" B:"<<b <<std::endl;
    std::cout <<"PACZPAN B[0..3]: " << (int)b[0] << " " << (int)b[1] << " " << (int)b[2] << " " <<(int)b[3] << std::endl;
    std::cout <<"PACZPAN B[0..3]: " << (char)b[0] << " " << (char)b[1] << " " << (char)b[2] << " " <<(char)b[3] << std::endl;
    result = a + b;
  //  std::cout <<"PACZPAN res:"<<result <<std::endl;
    return SOAP_OK;
}

ns::password is just a string as well.

Now I send a request with argument B='PŁOCK' by 2 different means which in wireshark shows either as 'PŁOCK' or P&#x141;OCK, so I think both are correct. Also logging of gsoap prints:

POST / HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: text/xml;charset=UTF-8
SOAPAction: ""
Content-Length: 471
Host: localhost:8080
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)

<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:calc">
   <soapenv:Header/>
   <soapenv:Body>
      <urn:concat soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
         <a xsi:type="urn:password">      </a>
         <b xsi:type="xsd:string">PŁOCK</b>
      </urn:concat>
   </soapenv:Body>
</soapenv:Envelope>

When server receives it, it becomes PAOCK. No bad bytes outside of ASCII, just different letter.

PACZPAN B[0..3]: 80 65 79 67
PACZPAN B[0..3]: P A O C

I don't care that std::string does not handle unicode well. I want it to handle bytes sent as they are.

I could add mapping in typemap.dat: xsd__string = | std::wstring, but I don't want to use std::wstring - it is not utf-8 anyway.


Solution

  • GSOAP by default does not handle characters outside of latin set: doc. It can be changed during initialization of soap context with a flag:

    struct soap *soap = soap_new1( SOAP_C_UTFSTRING );