Search code examples
c#.netxmlmalformedxmlexception

using C#'s XmlReader on slightly malformed XML


I'm trying to use C#'s XmlReader on a large series of XML files, they are all properly formatted except for a few select ones (unfortunately I'm not in a position to have them changed, because it would break a lot of other code).

The errors only come from one specific part of the these affronting XML files and it's ok to just skip them but I don't want to stop reading the rest of the XML file.

The bad parts look like this:

 <InterestingStuff>
  ...
    <ErrorsHere OptionA|Something = "false" OptionB|SomethingElse = "false"/>
    <OtherInterestingStuff>
    ...
    </OtherInterestingStuff>
</InterestingStuff>

So really if I could just ignore invalid tags, or ignore the pipe symbol then I would be ok.

Trying to use XmlReader.Skip() when I see the name "ErrorsHere" doesn't work, apparently it already reads a bit ahead and throws the exception.

TLDR: How do I skip so I can read in the XML file above, using the XmlReader?

Edit:

Some people suggested just replacing the '|'-symbol, but the idea of XmlReader is to not load the entire file but only traverse parts you want, since I'm reading directly from files I can not afford the read in entire files, replace all instances of '|' and then read parts again :).


Solution

  • I've experimented a bit with this in the past.

    In general the input simply has to be well-formed. An XmlReader will go into an unrecoverable error-state when the basic XML rules are broken. It is easy to avoid schema-validation but that's not relevant here.

    Your only option is to clean the input, that can be done in a streaming manner (custom Stream or TextReader) but that will require a light form of parsing. If you don't have pipe-symbols in valid positions it's easy.