So far, what I'm doing is:
try
{
XmlDocument xmldoc = loadXml(orderFilePath);
}
catch (XmlException exception)
{
//... blah blah - there was an error, let the user know
}
But I would really like to be able to attempt to parse the file anyway. When I say "malformed" I don't necessarily mean that there will be an unclosed tag or element, but that there might be something like one of the following included in an element's value: '<', '>', '&'
I've seen mentioned around that I would probably have to use XmlReader - but would that still throw an exception on that element, or allow me to fix the problem in some way?
I know fixing the XML at the source is the best solution, but I do not control where the XML is coming from.
Thanks!
EDIT:
Super simple example of the XML:
<Order>
<Customer_ID>555-555-5555</Customer_ID>
<ShipToAddress>
<Customer_Name>Some Guy</Customer_Name>
<Street>123 Fake Dr.</Street>
<Street2></Street2>
<City>West Palm Beach</City>
<State>FL</State>
<ZipCode>33417</ZipCode>
<Country>United States</Country>
</ShipToAddress>
<BillToAddress>
<Customer_Name>Some Guy</Customer_Name>
<Street>123 Fake Dr.</Street>
<Street2></Street2>
<City>West Palm Beach</City>
<State>FL</State>
<ZipCode>33417</ZipCode>
<Country>United States</Country>
</BillToAddress>
<items>
<item>
<Product_ID>25101</Product_ID>
<Product_Name></Product_Name>
<Quantity>1</Quantity>
<USPrice>26.95000</USPrice>
</item>
</items>
<!-- bad stuff here -->
<How_did_you_hear_about_us>Coffee & Tea magazine</How_did_you_hear_about_us>
<!-- bad stuff here -->
</Order>
The thing is - I don't necessarily know if it will always be in the same place.
One approach could be to validate a few things before parsing it. You could use a regex to validate the XML tags, but perhaps more easier could be a Stack
where you add every <
and >
symbol on. Afterwards just loop trough it and assert that you don't get the same symbol twice in a row.
This raises the question: how do you distinguish between <MyElement>>
and <MyEl>ement>
?
This is all pretty vague though: what do you want to happen when the XML turns out to be invalid? How far do you want to take this pre-processing validation?
I believe that the best option here is to not proceed. You can't fix every issue with malformed XML thrown at you and it might just be better to inform the user and make that the end.
If the source is consistently sending malformed XML at you, you'll have to contact the maintainers or look for alternatives.