Search code examples
xmldelphixml-parsingdelphi-7

Delphi xml parser, why are some characters illegal - but many are not?


I've got a pretty basic xml, for which I made an interface through the automatic generator in delphi 7. This was working fine, until I ran into some odd characters being sent my way. As an example:

<AfasGetConnector>
  <Medewerker>
    <Afstortnummer>0032123</Afstortnummer>
    <Naam>Wiaëröóíïúáäâtè</Naam>
  </Medewerker>
</AfasGetConnector>

Pulling this into Firefox / IE will quickly tell you that there's illegal characters in it. To be exact: ë, é and ö will not be accepted. The rest however, are perfectly fine. (Even the capital versions Ë, É and Ö are fine)

This confuses me. Why would those 3 be illegal, but "ä" and most others be fine? Are there any others I should worry about?

The whole block is given to me in a CDATA,. so the initial transfer goes fine,. After that however, I need to pick through the individual "Medewerker" elements from the xml,. which are not encapsulated in the CDATA. Hence the issue.


Solution

  • Pulling this into Firefox / IE will quickly tell you that there's illegal characters in it.

    Works fine for me. Neither Firefox nor IE complain about the characters at all.

    This confuses me. Why would those 3 be illegal, but "ä" and most others be fine?

    They are not illegal at all. The XML specification allows most Unicode codepoints to be used (minus non-printable control characters, UTF-16 surrogates, and reserved codepoints). All of the characters you have shown are legal.

    The whole block is given to me in a CDATA,. so the initial transfer goes fine,. After that however, I need to pick through the individual "Medewerker" elements from the xml,. which are not encapsulated in the CDATA. Hence the issue.

    You are likely encountering an encoding mismatch between what the XML parser thinks the XML is encoded as, and what the XML is actually encoded as. But since you have not provided the original raw bytes of the XML that was transferred, or the code that is trying to load and parse it, there is no way to know for sure what is actually happening.