I'm facing an issue since two days with libxml2 and it annoys me a lot.
Basically, I send a xml buffer throught de network with socket. When I receive it on the server and try to parse it, it says :
parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xFF 0xFF 0xFF 0xFF
The one tag of the xml tree (biometricData) I'm trying to send contains weird bizarre caracters cuz this is a raw buffer of crypted data.
<biometricData>������������������������
</biometricHeader><biometricData>
^
Client side :
xmlDocDumpMemoryEnc(doc, &(*out), &buffersize, "UTF-8");
Server side:
int verify(unsigned char *data, int len) //The routine to check the data and authenticate user.
xmlParserCtxtPtr ctx_ptr = xmlNewParserCtxt();
doc = xmlCtxtReadMemory(ctx_ptr, (const char*)data, len, "data.xml", "UTF-8", 0);
The server throw this error :
Entity: line 2: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xFF 0xFF 0xFF 0xFF
d>2.23.42.9.10.4.2</oid></formatOwner></format></biometricHeader><biometricData>
^
I try to change to change the encoding to ISO-8859-1 for the server side and it works! But, when I try to do nodeGetContent(biometricData) the data get not in their original encoding, so the buffer is completely useless.
Thank for you taking time. I've search towards the xmlCharEncodingHandler but I do not succeed...
Encode all your binary data with base64. XML parsers usually can/will not handle binary data.
I don't know how much binary data you process, so if base64 encoding/decoding becomes inefficient take a look at http://www.xml.com/pub/a/98/07/binary/binary.html