Search code examples
c#xmldataset

Large XML files in dataset (outofmemory)


I am currently trying to load a slightly large xml file into a dataset. The xml file is about 700 MB and every time I try to read the xml it needs plenty of time and after a while it throws an "out of memory" exception.

DataSet ds = new DataSet();
ds.ReadXml(pathtofile);

The main problem is, that it is necessary for me to use those datasets (I use it to import the data from xml file into a sybase database (foreach table, foreach row, foreach column)) and that I have no scheme file.

I already googled a while, but I did only find solutions that won't be usable for me.

Additional information: I use a Sybase (ASA 9) database, but my C# application crashes before I handle the db. The error occures after I read the XML into the dataset and want to work with the ds. I already read that this is a known error when using datasets with large content. I need the data in a dataset at least once, because I need to import it into the db.


Solution

  • You may be able to get past this using an overload of the ReadXml method. Pass in a buffered stream instead, and see if this speeds things up for you.

    Here is code:

    DataSet ds = new DataSet();
    FileStream filestream = File.OpenRead(pathtofile);
    BufferedStream buffered = new BufferedStream(filestream);
    ds.ReadXml(buffered);
    

    With the size of the data you are talking about, the dataset itself may get memory constrained. Part of the problem with XML is that it can take 500kb of data and turn it into 500 MB simply by poor choice of element name and nesting depth. Since you are lacking a schema, you may be able to short circuit the memory constraint by reading the file like above, and simply replace the element names with shorter versions (e.g. Replace <Version></Version> with <V></V> for a reduction in bytes of >60%).

    Good luck, and I hope this helps!