Search code examples
c#streamreaderstreamwriter

Writing file line by line in C# very slow using streamreader/streamwriter


I wrote a Winform application that reads in each line of a text file, does a search and replace using RegEx on the line, and then it writes back out to a new file. I chose the "line by line" method as some of the files are just too large to load into memory.

I am using the BackgroundWorker object so the UI can be updated with the progress of the job. Below is the code (with parts omitted for brevity) that handles the reading and then outputting of the lines in the file.

public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
    // Details of obtaining file paths omitted for brevity

    int totalLineCount = File.ReadLines(inputFilePath).Count();

    using (StreamReader sr = new StreamReader(inputFilePath))
    {
      int currentLine = 0;
      String line;
      while ((line = sr.ReadLine()) != null)
      {
        currentLine++;

        // Match and replace contents of the line
        // omitted for brevity

        if (currentLine % 100 == 0)
        {
          int percentComplete = (currentLine * 100 / totalLineCount);
          bgWorker.ReportProgress(percentComplete);
        }

        using (FileStream fs = new FileStream(outputFilePath, FileMode.Append, FileAccess.Write))
        using (StreamWriter sw = new StreamWriter(fs))
        {
          sw.WriteLine(line);
        }
      }
    }
}

Some of the files I am processing are very large (8 GB with 132 million rows). The process takes a very long time (a 2 GB file took about 9 hours to complete). It looks to be working at around 58 KB/sec. Is this expected or should the process be going faster?


Solution

  • Don't close and re-open the writing file every loop iteration, just open the writer outside the file loop. This should improve performance as the writer no longer needs to seek to the end of the file every single loop iteration.

    AlsoFile.ReadLines(inputFilePath).Count(); is causing you to read your input file twice and could be a big chunk of time. Instead of a percentage based off of lines calculate the percentage based off of stream position.

    public void bgWorker_DoWork(object sender, DoWorkEventArgs e) 
    { 
        // Details of obtaining file paths omitted for brevity
    
        using (StreamWriter sw = new StreamWriter(outputFilePath, true)) //You can use this constructor instead of FileStream, it does the same operation.
        using (StreamReader sr = new StreamReader(inputFilePath))
        {
          int lastPercentage = 0;
          String line;
          while ((line = sr.ReadLine()) != null)
          {
    
            // Match and replace contents of the line
            // omitted for brevity
    
            //Poisition and length are longs not ints so we need to cast at the end.
            int currentPercentage = (int)(sr.BaseStream.Position * 100L / sr.BaseStream.Length);
            if (lastPercentage != currentPercentage )
            {
              bgWorker.ReportProgress(currentPercentage );
              lastPercentage = currentPercentage;
            }
              sw.WriteLine(line);
          }
        }
    }
    

    Other than that you will need to show what Match and replace contents of the line omitted for brevity does as I would guess that is where your slowness comes from. Run a profiler on your code and see where it is taking the most time and focus your efforts there.