Search code examples
c#xmlxmldocument

Read a 'fake' xml document (xml fragment) in a XmlTextReader object


[Case] I have reveived a bunch of 'xml files' with metadata about a big number of documents in them. At least, that was what I requested. What I received where 'xml files' without a root element, they are structured something like this (i left out a bunch of elements):

<folder name = "abc"></folder>
<folder name = "abc/def">
<document name = "ghi1">
</document>
<document name = "ghi2">
</document>
</folder>

[Problem] When I try to read the file in an XmlTextReader object it fails telling me that there is no root element.

[Current workaround] Of course I can read the file as a stream, append < xmlroot> and < /xmlroot> and write the stream to a new file and read that one in XmlTextReader. Which is exactly what I am doing now, but I prefer not to 'tamper' with the original data.

[Requested solution] I understand that I should use XmlTextReader for this, with the DocumentFragment option. However, this gives the compiletime error:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll

Additional information: XmlNodeType DocumentFragment is not supported for partial content parsing. Line 1, position 1.

[Faulty code]

using System.Diagnostics;
using System.Xml;

namespace XmlExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string file = @"C:\test.txt";
            XmlTextReader tr = new XmlTextReader(file, XmlNodeType.DocumentFragment, null);
            while(tr.Read())
                Debug.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
        }
    }
}

Solution

  • Even though the XmlReader can be made to read the data using the ConformanceLevel.Fragment option as demonstrated by Martijn, it seems that XmlDataDocument does not like the idea of having multiple root elements.

    I thought I'd try a different approach, much like the one you're currently using, but without the intermediate file. Most XML libraries (XmlDocument, XDocument, XmlDataDocument) can take a TextReader as an input, so I've implemented one of my own. It's used like so:

    var dataDocument = new XmlDataDocument();
    dataDocument.Load(new FakeRootStreamReader(File.OpenRead("test.xml")));
    

    The code of the actual class:

    public class FakeRootStreamReader : TextReader
    {
        private static readonly char[] _rootStart;
        private static readonly char[] _rootEnd;
    
        private readonly TextReader _innerReader;
        private int _charsRead;
        private bool _eof;
    
        static FakeRootStreamReader()
        {
            _rootStart = "<root>".ToCharArray();
            _rootEnd = "</root>".ToCharArray();
        }
    
        public FakeRootStreamReader(Stream stream)
        {
            _innerReader = new StreamReader(stream);
        }
    
        public FakeRootStreamReader(TextReader innerReader)
        {
            _innerReader = innerReader;
        }
    
        public override int Read(char[] buffer, int index, int count)
        {
            if (!_eof && _charsRead < _rootStart.Length)
            {
                // Prepend root element
                return ReadFake(_rootStart, buffer, index, count);
            }
    
            if (!_eof)
            {
                // Normal reading operation
                int charsRead = _innerReader.Read(buffer, index, count);
                if (charsRead > 0) return charsRead;
    
                // We've reached the end of the Stream
                _eof = true;
                _charsRead = 0;
            }
    
            // Append root element end tag at the end of the Stream
            return ReadFake(_rootEnd, buffer, index, count);
        }
    
        private int ReadFake(char[] source, char[] buffer, int offset, int count)
        {
            int length = Math.Min(source.Length - _charsRead, count);
            Array.Copy(source, _charsRead, buffer, offset, length);
            _charsRead += length;
            return length;
        }
    }
    

    The first call to Read(...) will return only the <root> element. Subsequent calls read the stream as normal, until the end of the stream is reached, then the end tag is outputted.

    The code is a bit... meh... mostly because I wanted to handle some never-gonna-happen cases where someone tries to read the stream less than 6 characters at a time.