Search code examples
c#xmlpositionlinq-to-xml

Get an XElement's position and length in the original document


I am parsing an XML document for specific nodes, and would like to later on show the xml document in a ui, highlighting the specific portions. To do this, I need to know the position of the element in the document's text, and its length.

So far, I have found, that when loading the XDocument, I should specify LoadOptions.SetLineInfo, so I can at least get the position in the original xml string. This gives me the char, at which the element's name starts, so I should subtract one, to get the actual start of the tag. I have, however, not been able to find a way to get the position of the ending element.

What I've tried so far (LinqPad code using .Dump(), substitute with Console.WriteLine if necessary), the basic test code:

var xml = @"<xml>
  <myElement>
    <someProperty attribu=""attrVal1"" />
    <someOtherProp />
  </myElement>
</xml>";
// xml.Length => 105 (Note, there should be a TAB instead of four spaces before `<someOtherProp />`,
//                    to demonstrate problems)

var doc = XDocument.Parse(xml, LoadOptions.SetLineInfo);

var li = (IXmlLineInfo)doc;
$"{li.LineNumber - 1}:{li.LinePosition - 1}~{GetLen(doc.Root)}".Dump();

foreach (var el in doc.XPathSelectElements("//myElement/*"))
{
    li = (IXmlLineInfo) el;
    $"{li.LineNumber - 1}:{li.LinePosition - 1}~{GetLen(el)}".Dump();
}

Now, my implementations of GetLen:

First attempt: using .ToString()

int GetLen(XElement el)
{
    return el.ToString().Length;
}

This will reformat the code, so the TAB mentioned in the comment above will be expanded to four spaces instead. The doc will be 108 chars instead of 105 now. So, this is not an option.

Second attempt: using an XmlReader

int GetLen(XElement el)
{
    using (var r = el.CreateReader())
    {
        r.MoveToContent();
        var ox = r.ReadOuterXml();
        return ox.Length;
    }
}

This will throw out any unneccessary white space, leading to much shorter lengths (86 for doc). So, this is also not an option.

I have not been able to find any other meaningful way to accomplish what I need, other than manually parsing the XML myself, which I would like to avoid doing. Does anyone have an idea, what else I could try?

I could, of course, read in the xml, reformat it, and then use one of the options. But, since the XML is delivered by an external party, and we want to tell them, where we found mistakes, it would be best to know their indexes, and not the indexes after reformatting.

Thanks for your help!


Solution

  • It seems, this is currently not possible. We have instead opted to generate an XPath expression pointing to the exact element. This way, we can leave the formatting to whatever the UI wishes to do, but always have the correct element.