Search code examples
javaxmlstax

Java StAX - error when parsing - Illegal character entity: expansion character code 0x19


I am reading/parsing an XML file with javax.xml.stream.XMLStreamReader.
The file contains this piece of XML data as shown below.

<Row>
  <AccountName value="Paving 101" />
  <AccountNumber value="20205" />
  <AccountId value="15012" />
  <TimePeriod value="2019-08-20" />
  <CampaignName value="CMP Paving 101" />
  <CampaignId value="34283" />
  <AdGroupName value="residential paving" />
  <AdGroupId value="1001035" />
  <AdId value="790008" />
  <AdType value="Expanded text ad" />
  <DestinationUrl value="" />
  <BidMatchType value="Broad" />
  <Impressions value="1" />
  <Clicks value="1" />
  <Ctr value="100.00%" />
  <AverageCpc value="1.05" />
  <Spend value="1.05" />
  <AveragePosition value="2.00" />
  <SearchQuery value="concrete&#x19;driveway&#x19;repair&#x19;methods" />
</Row>

Unfortunately I am getting this error and I am not sure how to resolve it.

    Error in downloadXML: 
    com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x19
     at [row,col {unknown-source}]: [674,40]
        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:606)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:479)
        at com.ctc.wstx.sr.StreamScanner.reportIllegalChar(StreamScanner.java:2448)
        at com.ctc.wstx.sr.StreamScanner.validateChar(StreamScanner.java:2395)
        at com.ctc.wstx.sr.StreamScanner.resolveSimpleEntity(StreamScanner.java:1218)
        at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:1929)
        at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3063)
        at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2961)
        at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2837)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1072)

The problem seems to be with this character &#x19.
Of course I can first read the file simply as a text file, and replace this bad character, and only then parse it with XMLStreamReader but:
1) that approach seems really clumsy to me;
2) it will be a bit difficult to do as the code is quite involved there,
so I am not sure if I want to change it just for this character.

Why is the XMLStreamReader unable to handle this character?
Is the XML invalid or the parser has a bug and does not handle it well?


Solution

  • The characters &, < and > (as well as " or ' in attributes) are invalid in XML.

    They're escaped using XML entities, in this case you want &amp; for &.

    Your XML is invalid with every correct library ; (You need may be correct the producer of this XML content )

    **Edit* from https://www.w3.org/TR/xml/#NT-Char

    Allowed range for a entity reference :

    Reference ::= EntityRef | CharRef 
    EntityRef ::=       '&' Name ';'
    CharRef   ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]    /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */