Search code examples
c#xmlc#-4.0file-ioxmlwriter

Write XML directly to disk and append elements


I am trying to write an XML file but it is too large to store in memory, thus I want to write to it directly to disk. I have tried using XmlWriter but there is not functionality to enable me to append to the end of the file, hence I am willing to resort to writing the XML raw using a regular file writer.

Does anyone know of any file writing classes that enable me to write straight to disk and which enable me to overwrite positions inside the file?

The reason is that I need to be able to write over the closing of the root element so that I may append another bit of information, but also be able to read the XML file when needed. For example, if I had the following XML:

<elements>
  <element>
  </element>
</elements>

If I wanted to read this, I could, but if I want to write to it I must first delete the </elements> tag, append another element, and then append the closing tag again.

Thanks for any help.


Solution

  • You can use an XmlTextWriter.

    Just open the file for writing, seek back to the start of the end element, and then append any new elements you want with the XmlTextWriter. To close the file, simply write the raw text for the end element to make the document complete and you're done.

    Here's a quick and dirty example.

    Starting with XML like this:

    <?xml version="1.0" encoding="utf-8"?>
    <DocumentElement>
        <FirstElem/>
    </DocumentElement>
    

    You can open it and append an element like this:

    using (FileStream f = new FileStream(@"D:\a.xml", FileMode.OpenOrCreate, FileAccess.Write))
    {
        f.Seek(-("</DocumentElement>\n".Length), SeekOrigin.End);
        using (XmlTextWriter x = new XmlTextWriter(f, Encoding.UTF8))
        {
            x.WriteStartElement("Another");
            x.WriteAttributeString("attr", "value");
            x.WriteEndElement();
    
            // Close the file with a new terminating end-element
            x.WriteRaw("\r\n</DocumentElement>\r\n");
        }
    }
    

    And the result is:

    <?xml version="1.0" encoding="utf-8"?>
    <DocumentElement>
        <FirstElem/>
    <Another attr="value" />
    </DocumentElement>
    

    You may not get the indentation perfect etc, but it's valid XML. This is exactly what you'd do if writing xml as raw text to the file - but you might as well leverage the XML writer to do the formatting for you.

    I'd also agree with some of the comments - it will be very beneficial to use a schema for your xml that minimises the size. Turn off indentation. Use the shortest element and attribute names you can. And if you are working on leaf elements, storing data as attributes rather than cdata will save room (<element>data</element> is more expensive than <element val="data"/> and this can be compressed further to <e v="data"/> - almost half the original size)