Search code examples
c#readlinesfile.readalllines

Need help in understanding the explanation by Microsoft for File.ReadLines and File.ReadAllLines


According to the explanation by Microsoft for The ReadLines and ReadAllLines methods, When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned. When you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

What does it actually mean when they say:

1 - "When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned." If the below line of code is written, then doesn't it mean that ReadLines method execution is over and that the whole collection is returned & stored in variable filedata?

IEnumerable<String> filedata = File.ReadLines(fileWithPath)

2 - "When you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array". Does it mean that, in the below code snippet if a large file is read then the array variable hugeFileData will not have all the data if used immediately after the file was read?

string[] hugeFileData = File.ReadAllLines(path)
string i = hugeFileData[hugeFileData.length-1];

3 - "when you are working with very large files, ReadLines can be more efficient". If that is so, is the below code efficient when reading large file? I believe that the 2nd and 3rd line the below code would read the file twice, correct me if I am wrong.

string fileWithPath = "some large sized file path";
string lastLine = File.ReadLines(fileWithPath).Last();
int totalLines = File.ReadLines(fileWithPath).Count();

The reason of calling ReadLines on the same file twice in the above code snippet is that when I tried the below code, I got an exception "Cannot read from a closed TextReader" on the 3rd line in the below code snippet.

IEnumerable<String> filedata = File.ReadLines(fileWithPath);
string lastLine = filedata.Last();
int totalLines = filedata.Count();

Solution

  • The difference between ReadLines and ReadAllLines is easily illustrated by code.

    If you write this:

    foreach (var line in File.ReadLines(filename))
    {
        Console.WriteLine(line);
    }
    

    What happens is similar to this:

    using (var reader = new StreamReader(filename))
    {
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();
            Console.WriteLine(line);
        }
    }
    

    The actual code generated is a little more complex (ReadLines returns an enumerator whose MoveNext method reads and returns each line), but from the outside the behavior is similar.

    The key to that behavior is deferred execution, which you should understand well in order to make good use of LINQ. So the answer to your first question is "No." All the call to ReadLines does is open the file and return an enumerator. It doesn't read the first line until you ask for it.

    Note here that the code can output the first line before the second line is even read. In addition, you're only using memory for one line at a time.

    ReadAllLines has much different behavior. When you write:

    foreach (var line in File.ReadAllLines(filename))
    {
        Console.WriteLine(line);
    }
    

    What actually happens is more like this:

    List<string> lines = new List<string>();
    using (var reader = new StreamReader(filename))
    {
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();
            lines.Add(line);
        }
    }
    foreach (var line in lines)
    {
        Console.WriteLine(line);
    }
    

    Here, the program has to load the entire file into memory before it can output the first line.

    Which one you use depends on what you want to do. If you just need to access the file line-by-line, then ReadLines is usually the better choice--especially for large files. But if you want to access lines randomly or if you'll be reading the file multiple times, then ReadAllLines might be better. However, remember that ReadAllLines requires that you have enough memory to hold the entire file.

    In your third question you showed this code, which produced an exception on the last line:

    IEnumerable<String> filedata = File.ReadLines(fileWithPath);
    string lastLine = filedata.Last();
    int totalLines = filedata.Count();
    

    What happened here is that the first line returned an enumerator. The second line of code enumerated the entire sequence (i.e. read to the end of the file) so that it could find the last line. The enumerator saw that it was at end of file and closed the associated reader. The last line of code again tries to enumerate the file, but the file was already closed. There's no "reset to the start of the file" functionality in the enumerator returned by ReadLines.