Search code examples
c#objectrammigradoc

How to avoid high ram usage by adding plenty of rows with MigraDoc?


I'm currently working on a project that reads a large file or rather multiple files with > millions of lines. To do so I use Streamreader to read each line. Every line is checked if it includes a certain string. When the condition is true, I'll add a row. I have to reproduce the code from memory since I haven't the code in front of me:

Table table = new Table();
Row row = new Row();
Cell cell = new Cell();
using(Streamreader sr = new Streamreader(file))
{
    string str;
    while((str = sr.ReadLine()) != null)
    {
       if(str.Includes("Marker"))
       {
          row = table.AddRow();
          cell = row.Cells[0]
          cell = row.Cells[1] // actually I use a counter variable, cause my table has 6 cells consistent.
       }
    }
}

So everytime the condition is true, a row object will be added and with millions of these lines there will be also millions of objects, which will affect my ram memory and is most likely going to "explode". I tried several things e.g. create a list with row objects and clear them after a certain number. But I had to figure out that it will not clear the objects from ram (list.Clear). I tried to invoke the garbage collector manually but it has a negative influence on my performance. And now I'm at a point where I dont know how to handle this. With a half of million lines it reaches nearly 7gb of ram and I have 8gb available.

I would appreciate any suggestion how I can avoid high ram or keep the ram low at least.

I also want to add that I'm new on stackoverflow and if anything is not clear feel free to point it out or point on me :P


Solution

  • You're doing the right thing by reading your input files from streams line-by-line. That means only the current line of each input file needs to be present in your RAM.

    But, you're doing the wrong thing by putting a row into your Table object for each line matching the marker. Those Table objects live in RAM. Attempts to create Table objects with millions upon millions of Row objects will use up your RAM, as you have discovered.

    The dotnet collection classes do a good job of supporting vast collections. But there's no magic around the use of RAM.

    You need to figure out a way to limit the number of Row objects in a Table object. Can you keep track of the row count, and when it reaches a certain number (who knows how big? 10K? 100K?) write the table to disk and create a new one?

    Also, it seems that Migradoc generates PDF files. Is a million-page pdf file a useful object? It seems unlikely. Same for RTF files.