I am reading/parsing an XML file with javax.xml.stream.XMLStreamReader
.
The file contains this piece of XML data as shown below.
<Row>
<AccountName value="Paving 101" />
<AccountNumber value="20205" />
<AccountId value="15012" />
<TimePeriod value="2019-08-20" />
<CampaignName value="CMP Paving 101" />
<CampaignId value="34283" />
<AdGroupName value="residential paving" />
<AdGroupId value="1001035" />
<AdId value="790008" />
<AdType value="Expanded text ad" />
<DestinationUrl value="" />
<BidMatchType value="Broad" />
<Impressions value="1" />
<Clicks value="1" />
<Ctr value="100.00%" />
<AverageCpc value="1.05" />
<Spend value="1.05" />
<AveragePosition value="2.00" />
<SearchQuery value="concretedrivewayrepairmethods" />
</Row>
Unfortunately I am getting this error and I am not sure how to resolve it.
Error in downloadXML:
com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x19
at [row,col {unknown-source}]: [674,40]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:606)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:479)
at com.ctc.wstx.sr.StreamScanner.reportIllegalChar(StreamScanner.java:2448)
at com.ctc.wstx.sr.StreamScanner.validateChar(StreamScanner.java:2395)
at com.ctc.wstx.sr.StreamScanner.resolveSimpleEntity(StreamScanner.java:1218)
at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:1929)
at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3063)
at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2961)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2837)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1072)
The problem seems to be with this character 
.
Of course I can first read the file simply as a text file, and replace this bad character, and only then parse it with XMLStreamReader
but:
1) that approach seems really clumsy to me;
2) it will be a bit difficult to do as the code is quite involved there,
so I am not sure if I want to change it just for this character.
Why is the XMLStreamReader unable to handle this character?
Is the XML invalid or the parser has a bug and does not handle it well?
The characters &
, <
and >
(as well as "
or '
in attributes) are invalid in XML.
They're escaped using XML entities, in this case you want &
for &
.
Your XML is invalid with every correct library ; (You need may be correct the producer of this XML content )
**Edit* from https://www.w3.org/TR/xml/#NT-Char
Allowed range for a entity reference :
Reference ::= EntityRef | CharRef
EntityRef ::= '&' Name ';'
CharRef ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */