Search code examples
c#xml.net-corexsltxmlwriter

XmlWriter Async operations fail with XmlWriterSettings.OutputMethod = Html


When creating an XmlWriter with XmlWriterSettings.OutputMethod = OutputMethod.Html, async operations fail. When creating the same with OutputMethod.AutoDetect (the default), async operations succeed.

Failing code (with fiddle):

var transform = new XslCompiledTransform();
using var reader = XmlReader.Create(new StringReader(@"
  <xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
    <xsl:output method=""html"" indent=""yes"" doctype-system=""html""/>
    <xsl:template match=""/"">
      <bar/>
    </xsl:template>
  </xsl:stylesheet>"));
transform.Load(reader);

var settings = transform.OutputSettings.Clone();
settings.CloseOutput = false;
settings.Async = true;

using var stream = new MemoryStream();
using (var writer = XmlWriter.Create(stream, settings))
{
    await writer.WriteStartDocumentAsync();
    await writer.WriteStartElementAsync(null, "foo", null);
    await writer.WriteEndElementAsync();
    await writer.WriteEndDocumentAsync();
}
stream.Position = 0;
var content = new StreamReader(stream).ReadToEnd();
Assert.Contains("foo", content);

with the stack trace:

Message: 
System.NotImplementedException : The method or operation is not implemented.

  Stack Trace: 
XmlWriter.WriteStartElementAsync(String prefix, String localName, String ns)
XmlWellFormedWriter.WriteStartElementAsync_NoAdvanceState(String prefix, String localName, String ns)
XmlWellFormedWriter.WriteStartElementAsync(String prefix, String localName, String ns)
XmlAsyncCheckWriter.WriteStartElementAsync(String prefix, String localName, String ns)

Working code (with working fiddle):

var settings = new XmlWriterSettings();
settings.CloseOutput = false;
settings.Async = true;

using var stream = new MemoryStream();
using (var writer = XmlWriter.Create(stream, settings))
{
    await writer.WriteStartDocumentAsync();
    await writer.WriteStartElementAsync(null, "foo", null);
    await writer.WriteEndElementAsync();
    await writer.WriteEndDocumentAsync();
}
stream.Position = 0;
var content = new StreamReader(stream).ReadToEnd();
Assert.Contains("foo", content);

Inspecting a variety of things in debug mode, both code paths appear to use an System.Xml.XmlAsyncCheckWriter under the hood.


Solution

  • Interestingly, it's not the OutputMethod causing this, but the doctype-system (Edit: sorry it's actually a combination of both as you'll see further down). Remove the attribute, and your async calls will magically work.

    I can show you what's happening, but can't tell you WHY they chose to do it this way.


    Firstly, the writer is created by XmlWriterSettings.CreateWriter(Stream). Cutting down all the fluff, it goes like this:

    internal XmlWriter CreateWriter(Stream output)
    {
        XmlWriter writer;
        if (Encoding.WebName == "utf-8") {
            switch (OutputMethod) {
                case XmlOutputMethod.Html:
                    writer= new HtmlUtf8RawTextWriter(output, this);
                    break;
            }
        }
    
        // Wrap with Xslt/XQuery specific writer if needed;
        // XmlOutputMethod.AutoDetect writer does this lazily when it creates the underlying Xml or Html writer.
        if (OutputMethod != XmlOutputMethod.AutoDetect) {
            if (IsQuerySpecific) {
                // Create QueryOutputWriter if CData sections or DocType need to be tracked
                writer = new QueryOutputWriter((XmlRawWriter)writer, this);
            }
        }
    
        // wrap with well-formed writer
        writer = new XmlWellFormedWriter(writer, this);
    
        if (_useAsync)
            writer = new XmlAsyncCheckWriter(writer);
    
        return writer;
    }
    

    So in the end, you get an onion/ogre layers of

    XmlAsyncCheckWriter(
        XmlWellFormedWriter(
            QueryOutputWriter(
                HtmlUtf8RawTextWriter)))
    

    When you do your Write...Async() calls, you'd expect it to cascade from the outer Writer a all the way down to the deepest level in HtmlUtf8RawTextWriter - which does have the async calls you want.

    UNFORTUNATELY, the QueryOutputWriter wrapper does NOT delegate the Async calls to the inner writer, and is actually the one throwing a NotImplementedException. Is it a bug? Or a deliberate choice? I don't know.

    If you don't need a DOCTYPE, and don't use CDATA in your output (both handled by our problematic QueryOutputWriter), simply removing the doctype-system from your XSL will solve your problem. It would result in the following IsQuerySpecific to be false, preventing the undesirable wrapping.

    private bool IsQuerySpecific => 
        CDataSectionElements.Count != 0
        || _docTypePublic != null
        || _docTypeSystem != null
        || _standalone == XmlStandalone.Yes;
    
    ...
    
    if (IsQuerySpecific)
        xmlWriter = new QueryOutputWriter((XmlRawWriter)xmlWriter, this);
    

    If you do need DOCTYPE/CDATA, then it will be a fun exercise of reimplementing some of the layers and overriding the functions.