Search code examples
c#xmlserializationxml-serializationxmlserializer

C# XmlSerializer - output multiple xml fragments controlling new lines


I want to be able to write fragments of indented XML with no namespaces, no XML preambles, and \n for line endings, using XmlSerializer for each fragment and using a single XmlWriter instance for all the fragments. How can I do that?

XmlSerializer.Serialize() produces indented output when serializing to a generic output Stream, but it uses "\n\r" for line endings and I can't find how to configure that.

I can serialize to an XmlWriter, which can be configured in detail, but the config only seems to work when you output complete a single document rather than multiple document fragments because XmlSerializer will throw an exception with ConformanceLevel.Fragment:

WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment

And, if as a workaround I call XmlWriter.WriteWhitespace("");, indentation gets disabled. Specifically, If I create an XmlWriter like this:

XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
    ConformanceLevel = ConformanceLevel.Fragment,
    NamespaceHandling = NamespaceHandling.Default,
    NewLineChars = "\n",
    Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
    Indent = true,
    NewLineHandling = NewLineHandling.Replace,
    OmitXmlDeclaration = true,
    WriteEndDocumentOnClose = false,
    CloseOutput = false,
    CheckCharacters = false,
    NewLineOnAttributes = false,
});

MVE

https://dotnetfiddle.net/qGLIlL It does allow me to serialize multiple objects without creating a new XmlWriter for each one. But there's no indentation.

<flyingMonkey name="Koko"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>

<flyingMonkey name="New Name"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>
public class Program {
    public static void Main()
    {
        XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
        {
            ConformanceLevel = ConformanceLevel.Fragment,
            NamespaceHandling = NamespaceHandling.Default,
            NewLineChars = "\n",
            Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
            Indent = true,
            NewLineHandling = NewLineHandling.Replace,
            OmitXmlDeclaration = true,
            WriteEndDocumentOnClose = false,
            CloseOutput = false,
            CheckCharacters = false,
            NewLineOnAttributes = false,
        });
        var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });

        // without this line I get an error:
        //   "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment."
        xw.WriteWhitespace("");

        FlyingMonkey monkey = FlyingMonkey.Create();
        XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
        ser.Serialize(xw, monkey, noNamespace);
        xw.WriteWhitespace("\n\n");
        monkey.name = "New Name";
        ser.Serialize(xw, monkey, noNamespace);
    }
}

[System.Xml.Serialization.XmlTypeAttribute(TypeName = "flyingMonkey", Namespace=null)]
public class FlyingMonkey
{
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string name;

    public Limb[] limbs;

    public static FlyingMonkey Create() =>
        new FlyingMonkey()
        {
            name = "Koko",
            limbs = new Limb[]
            {
                new Limb() { name = "leg" }, new Limb() { name = "arm" },
                new Limb() { name = "tail" }, new Limb() { name = "wing" },
            }
        };
}

[System.Xml.Serialization.XmlTypeAttribute(TypeName = "limb", Namespace=null)]
public class Limb
{
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string name;
}

What kinda works:

XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
    ConformanceLevel = ConformanceLevel.Auto,
    NamespaceHandling = NamespaceHandling.Default,
    NewLineChars = "\n",
    Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
    Indent = true,
    NewLineHandling = NewLineHandling.Replace,
    OmitXmlDeclaration = true,
    WriteEndDocumentOnClose = false,
    CloseOutput = false,
    CheckCharacters = false,
    NewLineOnAttributes = false,
});
var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });

// This is not needed anymore. If I invoke that, it will kill indentation for some reason.
// xw.WriteWhitespace("");

FlyingMonkey monkey = FlyingMonkey.Create();
XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
ser.Serialize(xw, monkey, noNamespace);
// xw.WriteWhitespace("\n\n");
// monkey.name = "New Name";
// ser.Serialize(xw, monkey, noNamespace); // this second serialization throws InvalidOperationException

It does print with right line endings, but won't let you write another object to the same XmlWriter instance.

<flyingMonkey name="Koko">
  <limbs>
    <limb name="leg" />
    <limb name="arm" />
    <limb name="tail" />
    <limb name="wing" />
  </limbs>
</flyingMonkey>

I want to reuse my single instance of XmlWriter because I need to be writing up to 100k elements, and creating an XmlWriter for each one adds a lot of overhead


Solution

  • Your basic problem is that XmlSerializer is designed to serialize to a single well-formed XML document -- not to a sequence of fragments. If you attempt to serialize to XML with ConformanceLevel.Fragment, you will get an exception

    WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment

    In this answer to .net XmlSerialize throws "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment", Wim Reymen identifies two workarounds:

    • Calling XmlWriter.WriteWhitespace("") before the first call to serialization.

      Unfortunately, as you have noticed, this disables indentation. The reason this happens is that XmlWriter disables indentation for mixed content and the call to write whitespace triggers the mixed content detection of XmlEncodedRawTextWriterIndent (demo here).

    • Calling XmlWriter.WriteComment("") before the first serialization.

      While this does not disable indentation, is does of course write a comment that you don't want.

    So what are your options for a workaround?

    Firstly, as you noticed you could create a separate XmlWriter for each item with CloseOutput = false. In comments you wrote doing so adds a lot of overhead, I need to be writing up to 100k elements, so was hoping to reuse the writer instance, but I recommend you profile to make sure because this workaround is very, very simple compared to the alternatives.

    Assuming you are writing to a Stream, you could create an extension method like this:

    public static partial class XmlExtensions
    {
        static Encoding Utf8EncodingNoBom { get; } = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
        
        public static void SerializeFragmentsToXml<T>(this IEnumerable<T> enumerable, Stream stream, XmlSerializer? serializer = null, XmlSerializerNamespaces? ns = null)
        {
            var newLine = "\n";
            var newLineBytes = Utf8EncodingNoBom.GetBytes(newLine+newLine);
            
            var settings = new XmlWriterSettings()
            {
                NamespaceHandling = NamespaceHandling.Default,
                NewLineChars = newLine,
                Encoding = Utf8EncodingNoBom, // supress BOM
                Indent = true,
                NewLineHandling = NewLineHandling.Replace,
                OmitXmlDeclaration = true,
                WriteEndDocumentOnClose = false,
                CloseOutput = false, // Required to prevent the stream from being closed between items
                CheckCharacters = false,
                NewLineOnAttributes = false,
            };
            
            serializer ??= new XmlSerializer(typeof(T));
            ns ??= new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
    
            bool first = true;
            foreach (var item in enumerable)
            {
                if (!first)
                    stream.Write(newLineBytes);
                using (var xmlWriter = XmlWriter.Create(stream, settings))
                    serializer.Serialize(xmlWriter, item, ns);
                first = false;
            }
        }
    }
    

    And use it e.g. as follows:

    var items = new [] { "Koko", "POCO", "Loco" }.Select(n => FlyingMonkey.Create(n));
    
    using var stream = new MemoryStream(); // Replace with some FileStream when serializing to disk
    
    var serializer = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
    items.SerializeFragmentsToXml(stream, serializer : serializer);
    

    Demo fiddle #1 here.

    Alternatively, if you really need to reuse the XmlWriter for performance reasons, you will need to call XmlWriter.WriteComment() to prevent the exception from XmlSerializer and edit out the unwanted comments afterwards, e.g. via some TextWriter decorator that removes them as they are being written on the fly.

    The following extension method seems to do this:

    public static partial class XmlExtensions
    {
        const string FirstCommentText = "first";
        const string FirstComment = $"<!--{FirstCommentText}-->";
        const string SubsequentCommentText = "subsequent";
        const string SubsequentComment = $"<!--{SubsequentCommentText}-->";
        
        static Encoding Utf8EncodingNoBom { get; } = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
        
        public static void SerializeFragmentsToXml<T>(this IEnumerable<T> enumerable, Stream stream, XmlSerializer? serializer = null, XmlSerializerNamespaces? ns = null)
        {
            string newLine = "\n";
            
            var settings = new XmlWriterSettings()
            {
                ConformanceLevel = ConformanceLevel.Fragment,
                NamespaceHandling = NamespaceHandling.Default,
                NewLineChars = newLine,
                Encoding = Utf8EncodingNoBom, // supress BOM
                Indent = true,
                NewLineHandling = NewLineHandling.Replace,
                OmitXmlDeclaration = true,
                WriteEndDocumentOnClose = false,
                CloseOutput = false, 
                CheckCharacters = false,
                NewLineOnAttributes = false,
            };
            
            serializer ??= new XmlSerializer(typeof(T));
            ns ??= new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
    
            using var innerTextWriter = new StreamWriter(stream, encoding : Utf8EncodingNoBom, leaveOpen  : true) { NewLine = newLine };
            using var textWriter = new FakeCommentRemovingTextWriter(innerTextWriter, new(FirstComment, ""), new(SubsequentComment, newLine)) { NewLine = newLine };
            using var xmlWriter = XmlWriter.Create(textWriter, settings);
    
            bool first = true;
            foreach (var item in enumerable)
            {
                xmlWriter.WriteComment(first ? FirstCommentText : SubsequentCommentText);
                serializer.Serialize(xmlWriter, item, ns);
                // XmlWriter buffers its output, so Flush() is required  to ensure that the fake comments are not split across calls to Write().
                xmlWriter.Flush(); 
                first = false;
            }
        }
        
        private class FakeCommentRemovingTextWriter : TextWriterDecorator
        {
            readonly KeyValuePair<string, string> [] replacements;
            
            public FakeCommentRemovingTextWriter(TextWriter baseWriter, params KeyValuePair<string, string> [] replacements) : base(baseWriter, true) => this.replacements = replacements;
            
            public override void Write(ReadOnlySpan<char> buffer)
            {
                foreach (var replacement in replacements)
                {
                    int index;
                    if ((index = StartsWithIgnoringWhitespace(buffer, replacement.Key)) >= 0)
                    {
                        if (index > 0)
                            base.Write(buffer.Slice(0, index));
                        buffer = buffer.Slice(index).Slice(replacement.Key.Length);
                        if (buffer.StartsWith(NewLine))
                            buffer = buffer.Slice(NewLine.Length);
                        if (!string.IsNullOrEmpty(replacement.Value))
                            base.Write(replacement.Value);
                    }
                }
                base.Write(buffer);
            }
            
            static int StartsWithIgnoringWhitespace(ReadOnlySpan<char> buffer, ReadOnlySpan<char> value)
            {
                for (int index = 0; index < buffer.Length; index++)
                {
                    if (buffer.Slice(index).StartsWith(value))
                        return index;
                    if (!XmlConvert.IsWhitespaceChar(buffer[index]) || index >= buffer.Length - value.Length)
                        break;
                }
                return -1;
            }
        }
    }
    
    public class TextWriterDecorator : TextWriter
    {
        // Override the same methods that are overridden in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/StringWriter.cs.
        TextWriter? baseWriter; // null when disposed
        readonly bool disposeBase;
        readonly Encoding baseEncoding;
    
        public TextWriterDecorator(TextWriter baseWriter, bool disposeBase = true) => 
            (this.baseWriter, this.disposeBase, this.baseEncoding) = (baseWriter ?? throw new ArgumentNullException(nameof(baseWriter)), disposeBase, baseWriter.Encoding);
    
        protected TextWriter BaseWriter => baseWriter == null ? throw new ObjectDisposedException(GetType().Name) : baseWriter;
        public override Encoding Encoding => baseEncoding;
        public override IFormatProvider FormatProvider => baseWriter?.FormatProvider ?? base.FormatProvider;
        [AllowNull] public override string NewLine 
        { 
            get => baseWriter?.NewLine ?? base.NewLine; 
            set
            {   
                if (baseWriter != null)
                    baseWriter.NewLine = value;
                base.NewLine = value;
            }
        }
    
        public override void Flush() => BaseWriter.Flush();
        public sealed override void Close() => Dispose(true);
        public override void Write(char value) => BaseWriter.Write(value);
        public sealed override void Write(char[] buffer, int index, int count) => this.Write(buffer.AsSpan(index, count));
        public override void Write(ReadOnlySpan<char> buffer) => BaseWriter.Write(buffer);
        public sealed override void Write(string? value) => Write(value.AsSpan());
    
        public override Task WriteAsync(char value) => BaseWriter.WriteAsync(value);
        public sealed override Task WriteAsync(string? value) => WriteAsync(value.AsMemory());
        public sealed override Task WriteAsync(char[] buffer, int index, int count) => WriteAsync(buffer.AsMemory(index, count));
        public override Task WriteAsync(ReadOnlyMemory<char> buffer, CancellationToken cancellationToken = default) => BaseWriter.WriteAsync(buffer, cancellationToken);
        //public virtual Task WriteAsync(StringBuilder? value, CancellationToken cancellationToken = default) - no need to override
    
        public override Task WriteLineAsync(char value) => BaseWriter.WriteLineAsync(value);
        public sealed override Task WriteLineAsync(string? value) => WriteLineAsync(value.AsMemory());
        public override Task WriteLineAsync(StringBuilder? value, CancellationToken cancellationToken = default) => BaseWriter.WriteLineAsync(value, cancellationToken);
        public sealed override Task WriteLineAsync(char[] buffer, int index, int count) => WriteLineAsync(buffer.AsMemory(index, count));
        public override Task WriteLineAsync(ReadOnlyMemory<char> buffer, CancellationToken cancellationToken = default) => BaseWriter.WriteLineAsync(buffer, cancellationToken);
        
        public override Task FlushAsync() => BaseWriter.FlushAsync();
        public override Task FlushAsync(CancellationToken cancellationToken) => BaseWriter.FlushAsync(cancellationToken);
    
        protected override void Dispose(bool disposing)
        {
            try
            {
                if (disposing)
                {
                    if (Interlocked.Exchange(ref this.baseWriter, null) is {} writer)
                        if (disposeBase)
                            writer.Dispose();
                        else
                            writer.Flush();
                }
            }
            finally
            {
                base.Dispose(disposing);
            }
        }
    
        public override async ValueTask DisposeAsync()
        {
            try
            {
                if (Interlocked.Exchange(ref this.baseWriter, null) is {} writer)
                    if (disposeBase)
                        await writer.DisposeAsync().ConfigureAwait(false);
                    else
                        await writer.FlushAsync().ConfigureAwait(false);
            }
            finally
            {
                await base.DisposeAsync().ConfigureAwait(false);
            }
        }
        
        public override string ToString() => string.Format("{0}: {1}", GetType().Name, baseWriter?.ToString() ?? "disposed");
    }
    

    But honestly I doubt it's worth the trouble. Demo fiddle #2 here.

    With either approach, the output looks like

    <flyingMonkey name="Koko">
      <limbs>
        <limb name="leg" />
        <limb name="arm" />
        <limb name="tail" />
        <limb name="wing" />
      </limbs>
    </flyingMonkey>
    
    <flyingMonkey name="POCO">
      <limbs>
        <limb name="leg" />
        <limb name="arm" />
        <limb name="tail" />
        <limb name="wing" />
      </limbs>
    </flyingMonkey>
    
    <flyingMonkey name="Loco">
      <limbs>
        <limb name="leg" />
        <limb name="arm" />
        <limb name="tail" />
        <limb name="wing" />
      </limbs>
    </flyingMonkey>