Search code examples
c#.net-4.0newlinenormalizationtext-normalization

Normalizing text file from abnormal newlines?


I have several text files that have lots of newlines between texts that I would like to normalize but there is no pattern to amount of newline between the texts for example:

Text




Some text








More text




More

more

So what I wanted to change where the amount of newline is bigger than X to Y so let's say, when there is 5 sequential newlines it becomes 2, 10 it becomes 3.

My currently problem is I don't know how I should go about identifying which lines I will have to normalize.

I know I could count the new lines using split, and other ways like verifying if the line is empty etc... But perhaps there is a simple regex or better approach to solve this problem ?


Solution

  • List<string> Normalize(string fileName, int size)
    {
        List<string> result = new List<string>();
        int blanks = 0;
    
        foreach (var line in File.ReadAllLines(fileName))
        {
            if (line.Trim() == "")
            {
                if (blanks++ < size)
                    result.Add("");
            }
            else
            {
                blanks = 0;
                results.Add(line);
            }
        }
        return line;
    }