Search code examples
c#.netlinq-to-xmlxmldocumentxml-validation

How to get the XPath (or Node) for the location of an XML schema validation failure?


I am using XDocument.Validate (it seems to function the same as XmlDocument.Validate) to validate an XML document against an XSD - this works well and I am informed of validation errors.

However, only some information seems to be exposed [reliably] in the ValidationEventHandler (and XmlSchemaException), e.g.:

  • the error message (i.e. "The 'X' attribute is invalid - The value 'Y' is invalid according to its datatype 'Z' - The Pattern constraint failed"),
  • the severity

What I would like is to get the "failing XPath" for the validation failure (where it makes sense): that is, I would like to get the failure in relation to the XML document (as opposed to the XML text).

Is there a way to obtain the "failing XPath" information from XDocument.Validate? If not, can the "failing XPath" be obtained through another XML validation method such as an XmlValidatingReader1?


Background:

The XML will be sent as data to my Web Service with an automatic conversion (via JSON.NET) from JSON to XML. Because of this I begin processing the XDocument data1 and not text, which has no guaranteed order due to the original JSON data. The REST client is, for reasons I care not to get into, basically a wrapper for HTML form fields over an XML document and validation on the server occurs in two parts - XML schema validation and Business Rule validation.

In the Business Rule validation it's easy to return the "XPath" for the fields which fail conformance that can be used to indicate the failing field(s) on the client. I would like to extend this to the XSD schema validation which takes care of the basic structure validation and, more importantly, the basic "data type" and "existence" of attributes. However, due to the desired automatic process (i.e. highlight the appropriate failing field) and source conversions, the raw text message and the source line/column numbers are not very useful by themselves.


Here is a snippet of the validation code:

// Start with an XDocument object - created from JSON.NET conversion
XDocument doc = GetDocumentFromWebServiceRequest();

// Load XSD    
var reader = new StringReader(EmbeddedResourceAccess.ReadResource(xsdName));
var xsd = XmlReader.Create(reader, new XmlReaderSettings());
var schemas = new XmlSchemaSet();
schemas.Add("", xsd);

// Validate
doc.Validate(schemas, (sender, args) => {
  // Process validation (not parsing!) error,
  // but how to get the "failing XPath"?
});

Update: I found Capture Schema Information when validating XDocument which links to "Accessing XML Schema Information During Document Validation" (cached) from which I determined two things:

  1. XmlSchemaException can be specialized into XmlSchemaValidationException which has a SourceObject property - however, this always returns null during validation: "When an XmlSchemaValidationException is thrown during validation by a validating XmlReader object, the value of the SourceObject property is null".

  2. I can read through the document (via XmlReader.Read) and "remember" the path prior to the validation callback. While this "seems like it works" in initial tests (without a ValidationCallback), it feels quite inelegant to me - but I've been able to find little else.


Solution

  • Sender of validation event is a source of event. So, you can search over the network for code which gets XPath for node (e.g. Generating an XPath expression) and generate XPath for source of event:

    doc.Validate(schemas, (sender, args) => {
      if (sender is XObject)
      { 
         xpath = ((XObject)sender).GetXPath();
      }
    });