Search code examples
xmlgomarshalling

How to marshal a large amount of data to XML


I need to send a large amount of data via XML and my Docker container runs out of memory when performing the task. Is there a way using Go to incrementally marshal a large XML document and also incrementally write it to a file so as to minimize memory usage?


Solution

  • Use xml.Encoder to stream the XML output to an io.Writer that may be a network connection (net.Conn) or a file (os.File). The complete result will not be kept in memory.

    You may use Encoder.Encode() to encode a Go value to XML. Generally you may pass any Go value that you would pass to xml.Marshal().

    Encoder.Encode() only helps if the data you want to marshal is ready in-memory, which may or may not be feasible to you. E.g. if you want to marshal a large list which cannot (or should not) be read into memory, this will not be a salvation to you.

    If the input data also cannot be held in memory, then you may construct the XML output by tokens and elements. You may use Encoder.EncodeToken() for this which allows you to write "parts" of the result XML document.

    For example if you want to write a large list to the output, you may write a start element tag (e.g. <list>), then write the elements of the list one-by-one (each fetched from database or from file, or constructed by an algorithm on-the-fly), and once the list is marshaled, you may close the list element tag (</list>).

    Here's a simple example how you can do that:

    type Student struct {
        ID   int
        Name string
    }
    
    func main() {
        he := func(err error) {
            if err != nil {
                panic(err) // In your app, handle error properly
            }
        }
    
        // For demo purposes we use an in-memory buffer,
        // but this may be an os.File too.
        buf := &bytes.Buffer{}
    
        enc := xml.NewEncoder(buf)
        enc.Indent("", "  ")
    
        he(enc.EncodeToken(xml.StartElement{Name: xml.Name{Local: "list"}}))
        for i := 0; i < 3; i++ {
            // Here you can fetch / construct the records
            he(enc.Encode(Student{ID: i, Name: string(i + 'A')}))
        }
        he(enc.EncodeToken(xml.EndElement{Name: xml.Name{Local: "list"}}))
        he(enc.Flush())
    
        fmt.Println(buf.String())
    }
    

    Output of the above is (try it on the Go Playground):

    <list>
      <Student>
        <ID>0</ID>
        <Name>A</Name>
      </Student>
      <Student>
        <ID>1</ID>
        <Name>B</Name>
      </Student>
      <Student>
        <ID>2</ID>
        <Name>C</Name>
      </Student>
    </list>