I have a while loop going through an XML file, and for one of the nodes "url", there are sometimes invalid values within it. I put a try-catch statement around this to catch any invalid values. The problem is, whenever an invalid value is grabbed the while loop is killed and the program continues on outside of that loop. I need the while loop to continue reading through the rest of the XML file after an invalid value if found.
Here is my code:
XmlTextReader reader = new XmlTextReader(fileName);
int tempInt;
while (reader.Read())
{
switch (reader.Name)
{
case "url":
try
{
reader.Read();
if (!reader.Value.Equals("\r\n"))
{
urlList.Add(reader.Value);
}
}
catch
{
invalidUrls.Add(urlList.Count);
}
break;
}
}
I chose not to include the rest of the switch statement as it is not relevant. Here is a sample of my XML:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<visited_links_list>
<item>
<url>http://www.grcc.edu/error.cfm</url>
<title>Grand Rapids Community College</title>
<hits>20</hits>
<modified_date>10/16/2012 12:22:37 PM</modified_date>
<expiration_date>11/11/2012 12:22:38 PM</expiration_date>
<user_name>testuser</user_name>
<subfolder></subfolder>
<low_folder>No</low_folder>
<file_position>834816</file_position>
</item>
</visited_links_list>
The exception I get throughout the code is similar to the following:
"' ', hexadecimal value 0x05, is an invalid character. Line 3887, position 13."
Observation:
You're calling reader.Read()
twice for each entry. Once in while()
, and once within the case
. Do you really mean to skip records? This will cause an exception if there are an odd number of entries in the source XML (since reader.Read()
advances the pointer within the XML stream to the next item), but that exception will not be caught because it happens outside of your try...catch
.
Beyond that:
reader.Read(); /// might return false, but no exception, so keep going...
if (!reader.Value.Equals("\r\n")) /// BOOM if the previous line returned false, which you ignored
{
urlList.Add(reader.Value);
}
/// reader is now in unpredictable state
Edit
At the risk of writing a novel-length answer...
The error you're receiving
"' ', hexadecimal value 0x05, is an invalid character. Line 3887, position 13."
indicates that your source XML is malformed, and somehow wound up with a ^E
(ASCII 0x05) at the specified position. I'd have a look at that line. If you're getting this file from a vendor or a service, you should have them fix their code. Correcting that, and any other malformed content within your XML, should correct issue that you're seeing.
Once that is fixed, your original code should work. However, using XmlTextReader
for this isn't the most robust of solutions, and involves building some code that Visual Studio will happily generate for you:
In VS2012 (I don't have VS2010 installed any more, but it should be the same process):
Add a sample of the XML to your solution
In the properties for that file, set the CustomTool to "MSDataSetGenerator" (without the quotes)
The IDE should generate a .designer.cs file, containing a serializable class with a field for each item in the XML. (If not, right-click on the XML file in the solution explorer and select "Run Custom Tool".)
Use code like the following to load XML with the same schema as your sample at runtime:
/// make sure the XML doesn't have errors, such as non-printable characters
private static bool IsXmlMalformed(string fileName)
{
var reader = new XmlTextReader(fileName);
var result = false;
try
{
while (reader.Read()) ;
}
catch (Exception e)
{
result = true;
}
return result;
}
/// Process the XML using deserializer and VS-generated XML proxy classes
private static void ParseVisitedLinksListXml(string fileName, List<string> urlList, List<int> invalidUrls)
{
if (IsXmlMalformed(fileName))
throw new Exception("XML is not well-formed.");
using (var textReader = new XmlTextReader(fileName))
{
var serializer = new XmlSerializer(typeof(visited_links_list));
if (!serializer.CanDeserialize(textReader))
throw new Exception("Can't deserialize this XML. Make sure the XML schema is up to date.");
var list = (visited_links_list)serializer.Deserialize(textReader);
foreach (var item in list.item)
{
if (!string.IsNullOrEmpty(item.url) && !item.url.Contains(Environment.NewLine))
urlList.Add(item.url);
else
invalidUrls.Add(urlList.Count);
}
}
}
You can also do this with the XSD.exe tool included with the Windows SDK.