Search code examples
c#.netgarbage-collectionxmltextreader

Cleaning up memory after reading a giant xml element value


I rarely turn here for help, but this is driving me crazy: I'm reading an xml file that wraps an arbitrary number of items, each with a b64-encoded file (and some accompanying metadata for it). Originally I just read the whole file into an XmlDocument, but while that was much cleaner code, I realized there's no limit on the size of the file, and XmlDocument eats a lot of memory and can run out if the file is large enough. So I rewrote the code to instead use XmlTextReader, which works great if the issue is that the program was sent an xml file with a large number of reasonably-sized attachments... but there's still a big problem, and that's where I turn to you:

If my xml reader is at a File element, that element contains a value that's enormous (say, 500MB), and I call reader.ReadElementContentAsString(), I now have a string that occupies 500MB (or possibly an OutOfMemoryException). What I would like to do in either case is just write to a log, "that file attachment was totally way too big, we're going to ignore it and move on", then move onto the next file. But it doesn't appear that the string I just tried to read is ever garbage collected, so what actually happens is the string takes up all the RAM, and every other file it tries to read after that also throws an OutOfMemoryException, even though most of the files will be quite small.

Recall: at this point, I'm reading the element's value into a local string, so I would have expected it would be eligible for garbage collection immediately (and that it would thus be garbage collected, at the latest, when the program attempts to read the next item and discovers it has no memory available). But I've tried everything, just in case: setting the string to null, calling explicit GC.Collect()... no dice, Task Manager indicates the GC only collected about 40k, of the ~500MB it just requested to store the string in, and I still get out of memory exceptions attempting to read anything else.

There doesn't seem to be any way to know the length of the value contained in an xml element using XmlTextReader without reading that element, so I imagine I'm stuck reading the string... am I missing something, or is there really no way to read a giant value from an xml file without totally destroying your program's ability to do anything further afterwards? I'm going insane with this.

I have read a bit about C#'s GC, and the LOH, but nothing I read would have indicated to me that this would happen...

Let me know if you need any further information, and thanks!

edit: I did realize that the process was running as a 32-bit process, which meant it was being starved for memory a bit more than it should've been. Fixed that, this becomes less of an issue, but it is still behavior I'd like to fix. (It takes more and/or larger files to reach the point where an OutOfMemoryException is thrown, but once it is thrown, I still can't seem to reclaim that memory in a timely fashion.)


Solution

  • I had a similiar Issue with a soap Service used to transfer large files as base64 string.

    I used XDocument instead of XmlDocument back then, that did the trick for me.