I want to be able to write fragments of indented XML with no namespaces, no XML preambles, and \n
for line endings, using XmlSerializer
for each fragment and using a single XmlWriter
instance for all the fragments. How can I do that?
XmlSerializer.Serialize()
produces indented output when serializing to a generic output Stream, but it uses "\n\r" for line endings and I can't find how to configure that.
I can serialize to an XmlWriter
, which can be configured in detail, but the config only seems to work when you output complete a single document rather than multiple document fragments because XmlSerializer
will throw an exception with ConformanceLevel.Fragment
:
WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment
And, if as a workaround I call XmlWriter.WriteWhitespace("");
, indentation gets disabled. Specifically, If I create an XmlWriter
like this:
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Fragment,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
https://dotnetfiddle.net/qGLIlL
It does allow me to serialize multiple objects without creating a new XmlWriter
for each one. But there's no indentation.
<flyingMonkey name="Koko"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>
<flyingMonkey name="New Name"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>
public class Program {
public static void Main()
{
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Fragment,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
// without this line I get an error:
// "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment."
xw.WriteWhitespace("");
FlyingMonkey monkey = FlyingMonkey.Create();
XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
ser.Serialize(xw, monkey, noNamespace);
xw.WriteWhitespace("\n\n");
monkey.name = "New Name";
ser.Serialize(xw, monkey, noNamespace);
}
}
[System.Xml.Serialization.XmlTypeAttribute(TypeName = "flyingMonkey", Namespace=null)]
public class FlyingMonkey
{
[System.Xml.Serialization.XmlAttributeAttribute()]
public string name;
public Limb[] limbs;
public static FlyingMonkey Create() =>
new FlyingMonkey()
{
name = "Koko",
limbs = new Limb[]
{
new Limb() { name = "leg" }, new Limb() { name = "arm" },
new Limb() { name = "tail" }, new Limb() { name = "wing" },
}
};
}
[System.Xml.Serialization.XmlTypeAttribute(TypeName = "limb", Namespace=null)]
public class Limb
{
[System.Xml.Serialization.XmlAttributeAttribute()]
public string name;
}
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Auto,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
// This is not needed anymore. If I invoke that, it will kill indentation for some reason.
// xw.WriteWhitespace("");
FlyingMonkey monkey = FlyingMonkey.Create();
XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
ser.Serialize(xw, monkey, noNamespace);
// xw.WriteWhitespace("\n\n");
// monkey.name = "New Name";
// ser.Serialize(xw, monkey, noNamespace); // this second serialization throws InvalidOperationException
It does print with right line endings, but won't let you write another object to the same XmlWriter instance.
<flyingMonkey name="Koko">
<limbs>
<limb name="leg" />
<limb name="arm" />
<limb name="tail" />
<limb name="wing" />
</limbs>
</flyingMonkey>
I want to reuse my single instance of XmlWriter
because I need to be writing up to 100k elements, and creating an XmlWriter
for each one adds a lot of overhead
Your basic problem is that XmlSerializer
is designed to serialize to a single well-formed XML document -- not to a sequence of fragments. If you attempt to serialize to XML with ConformanceLevel.Fragment
, you will get an exception
WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment
In this answer to .net XmlSerialize throws "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment", Wim Reymen identifies two workarounds:
Calling XmlWriter.WriteWhitespace("")
before the first call to serialization.
Unfortunately, as you have noticed, this disables indentation. The reason this happens is that XmlWriter
disables indentation for mixed content and the call to write whitespace triggers the mixed content detection of XmlEncodedRawTextWriterIndent
(demo here).
Calling XmlWriter.WriteComment("")
before the first serialization.
While this does not disable indentation, is does of course write a comment that you don't want.
So what are your options for a workaround?
Firstly, as you noticed you could create a separate XmlWriter
for each item with CloseOutput = false
. In comments you wrote doing so adds a lot of overhead, I need to be writing up to 100k elements, so was hoping to reuse the writer instance, but I recommend you profile to make sure because this workaround is very, very simple compared to the alternatives.
Assuming you are writing to a Stream
, you could create an extension method like this:
public static partial class XmlExtensions
{
static Encoding Utf8EncodingNoBom { get; } = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
public static void SerializeFragmentsToXml<T>(this IEnumerable<T> enumerable, Stream stream, XmlSerializer? serializer = null, XmlSerializerNamespaces? ns = null)
{
var newLine = "\n";
var newLineBytes = Utf8EncodingNoBom.GetBytes(newLine+newLine);
var settings = new XmlWriterSettings()
{
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = newLine,
Encoding = Utf8EncodingNoBom, // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false, // Required to prevent the stream from being closed between items
CheckCharacters = false,
NewLineOnAttributes = false,
};
serializer ??= new XmlSerializer(typeof(T));
ns ??= new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
bool first = true;
foreach (var item in enumerable)
{
if (!first)
stream.Write(newLineBytes);
using (var xmlWriter = XmlWriter.Create(stream, settings))
serializer.Serialize(xmlWriter, item, ns);
first = false;
}
}
}
And use it e.g. as follows:
var items = new [] { "Koko", "POCO", "Loco" }.Select(n => FlyingMonkey.Create(n));
using var stream = new MemoryStream(); // Replace with some FileStream when serializing to disk
var serializer = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
items.SerializeFragmentsToXml(stream, serializer : serializer);
Demo fiddle #1 here.
Alternatively, if you really need to reuse the XmlWriter
for performance reasons, you will need to call XmlWriter.WriteComment()
to prevent the exception from XmlSerializer
and edit out the unwanted comments afterwards, e.g. via some TextWriter
decorator that removes them as they are being written on the fly.
The following extension method seems to do this:
public static partial class XmlExtensions
{
const string FirstCommentText = "first";
const string FirstComment = $"<!--{FirstCommentText}-->";
const string SubsequentCommentText = "subsequent";
const string SubsequentComment = $"<!--{SubsequentCommentText}-->";
static Encoding Utf8EncodingNoBom { get; } = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
public static void SerializeFragmentsToXml<T>(this IEnumerable<T> enumerable, Stream stream, XmlSerializer? serializer = null, XmlSerializerNamespaces? ns = null)
{
string newLine = "\n";
var settings = new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Fragment,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = newLine,
Encoding = Utf8EncodingNoBom, // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
};
serializer ??= new XmlSerializer(typeof(T));
ns ??= new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
using var innerTextWriter = new StreamWriter(stream, encoding : Utf8EncodingNoBom, leaveOpen : true) { NewLine = newLine };
using var textWriter = new FakeCommentRemovingTextWriter(innerTextWriter, new(FirstComment, ""), new(SubsequentComment, newLine)) { NewLine = newLine };
using var xmlWriter = XmlWriter.Create(textWriter, settings);
bool first = true;
foreach (var item in enumerable)
{
xmlWriter.WriteComment(first ? FirstCommentText : SubsequentCommentText);
serializer.Serialize(xmlWriter, item, ns);
// XmlWriter buffers its output, so Flush() is required to ensure that the fake comments are not split across calls to Write().
xmlWriter.Flush();
first = false;
}
}
private class FakeCommentRemovingTextWriter : TextWriterDecorator
{
readonly KeyValuePair<string, string> [] replacements;
public FakeCommentRemovingTextWriter(TextWriter baseWriter, params KeyValuePair<string, string> [] replacements) : base(baseWriter, true) => this.replacements = replacements;
public override void Write(ReadOnlySpan<char> buffer)
{
foreach (var replacement in replacements)
{
int index;
if ((index = StartsWithIgnoringWhitespace(buffer, replacement.Key)) >= 0)
{
if (index > 0)
base.Write(buffer.Slice(0, index));
buffer = buffer.Slice(index).Slice(replacement.Key.Length);
if (buffer.StartsWith(NewLine))
buffer = buffer.Slice(NewLine.Length);
if (!string.IsNullOrEmpty(replacement.Value))
base.Write(replacement.Value);
}
}
base.Write(buffer);
}
static int StartsWithIgnoringWhitespace(ReadOnlySpan<char> buffer, ReadOnlySpan<char> value)
{
for (int index = 0; index < buffer.Length; index++)
{
if (buffer.Slice(index).StartsWith(value))
return index;
if (!XmlConvert.IsWhitespaceChar(buffer[index]) || index >= buffer.Length - value.Length)
break;
}
return -1;
}
}
}
public class TextWriterDecorator : TextWriter
{
// Override the same methods that are overridden in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/StringWriter.cs.
TextWriter? baseWriter; // null when disposed
readonly bool disposeBase;
readonly Encoding baseEncoding;
public TextWriterDecorator(TextWriter baseWriter, bool disposeBase = true) =>
(this.baseWriter, this.disposeBase, this.baseEncoding) = (baseWriter ?? throw new ArgumentNullException(nameof(baseWriter)), disposeBase, baseWriter.Encoding);
protected TextWriter BaseWriter => baseWriter == null ? throw new ObjectDisposedException(GetType().Name) : baseWriter;
public override Encoding Encoding => baseEncoding;
public override IFormatProvider FormatProvider => baseWriter?.FormatProvider ?? base.FormatProvider;
[AllowNull] public override string NewLine
{
get => baseWriter?.NewLine ?? base.NewLine;
set
{
if (baseWriter != null)
baseWriter.NewLine = value;
base.NewLine = value;
}
}
public override void Flush() => BaseWriter.Flush();
public sealed override void Close() => Dispose(true);
public override void Write(char value) => BaseWriter.Write(value);
public sealed override void Write(char[] buffer, int index, int count) => this.Write(buffer.AsSpan(index, count));
public override void Write(ReadOnlySpan<char> buffer) => BaseWriter.Write(buffer);
public sealed override void Write(string? value) => Write(value.AsSpan());
public override Task WriteAsync(char value) => BaseWriter.WriteAsync(value);
public sealed override Task WriteAsync(string? value) => WriteAsync(value.AsMemory());
public sealed override Task WriteAsync(char[] buffer, int index, int count) => WriteAsync(buffer.AsMemory(index, count));
public override Task WriteAsync(ReadOnlyMemory<char> buffer, CancellationToken cancellationToken = default) => BaseWriter.WriteAsync(buffer, cancellationToken);
//public virtual Task WriteAsync(StringBuilder? value, CancellationToken cancellationToken = default) - no need to override
public override Task WriteLineAsync(char value) => BaseWriter.WriteLineAsync(value);
public sealed override Task WriteLineAsync(string? value) => WriteLineAsync(value.AsMemory());
public override Task WriteLineAsync(StringBuilder? value, CancellationToken cancellationToken = default) => BaseWriter.WriteLineAsync(value, cancellationToken);
public sealed override Task WriteLineAsync(char[] buffer, int index, int count) => WriteLineAsync(buffer.AsMemory(index, count));
public override Task WriteLineAsync(ReadOnlyMemory<char> buffer, CancellationToken cancellationToken = default) => BaseWriter.WriteLineAsync(buffer, cancellationToken);
public override Task FlushAsync() => BaseWriter.FlushAsync();
public override Task FlushAsync(CancellationToken cancellationToken) => BaseWriter.FlushAsync(cancellationToken);
protected override void Dispose(bool disposing)
{
try
{
if (disposing)
{
if (Interlocked.Exchange(ref this.baseWriter, null) is {} writer)
if (disposeBase)
writer.Dispose();
else
writer.Flush();
}
}
finally
{
base.Dispose(disposing);
}
}
public override async ValueTask DisposeAsync()
{
try
{
if (Interlocked.Exchange(ref this.baseWriter, null) is {} writer)
if (disposeBase)
await writer.DisposeAsync().ConfigureAwait(false);
else
await writer.FlushAsync().ConfigureAwait(false);
}
finally
{
await base.DisposeAsync().ConfigureAwait(false);
}
}
public override string ToString() => string.Format("{0}: {1}", GetType().Name, baseWriter?.ToString() ?? "disposed");
}
But honestly I doubt it's worth the trouble. Demo fiddle #2 here.
With either approach, the output looks like
<flyingMonkey name="Koko">
<limbs>
<limb name="leg" />
<limb name="arm" />
<limb name="tail" />
<limb name="wing" />
</limbs>
</flyingMonkey>
<flyingMonkey name="POCO">
<limbs>
<limb name="leg" />
<limb name="arm" />
<limb name="tail" />
<limb name="wing" />
</limbs>
</flyingMonkey>
<flyingMonkey name="Loco">
<limbs>
<limb name="leg" />
<limb name="arm" />
<limb name="tail" />
<limb name="wing" />
</limbs>
</flyingMonkey>