Search code examples
c#line-numbers

What's the fastest way to count the total lines of text file in c#?


my code is :

    int linenumber = File.ReadLines(path).Count();

but it takes long time (about 20 second) for files about 1 gig size .

so does anyone know better way to solve this problem ?

Update 6 :

I have tested your solutions :

for a file about 870 mb size :

method 1 : { my code time(seconds) : 13 }

method 2 : (from MarcinJuraszek & Locke) (the same) {

time(seconds) : 12 }

method 3 : (from Richard Deeming) { time(seconds) : 19 }

method 4 : (from user2942249) { time(seconds) : 13 }

method 5 : (from Locke) { time(seconds) : 13 is the same for lineBuffer = {4096 , 8192 , 16384 , 32768} }

method 6 : (from Locke edition 2) { time(seconds) : 9 for Buffer size = 32KB , time(seconds) : 10 for Buffer size = 64KB }

As i said , in my comment , there is an application (native code) , that opens this file in my pc in 5 second. therefore this is not about h.d.d speed.

By Compiling MSIL to Native Code , the difference was not obvious.

Conclusion : at this time , the Locke method 2 is faster than other method.

So i marked his post as Answer . but this post will be open if any one find better idea.

I gave +1 vote up for dear friends who help me to solve the problem.

Thanks for your help. interesting better idea . Best Regards Smart Man


Solution

  • Here are a few ways this can be accomplished quickly:

    StreamReader:

    using (var sr = new StreamReader(path))
    {
        while (!String.IsNullOrEmpty(sr.ReadLine()))
            lineCount ++;
    }
    

    FileStream:

    var lineBuffer = new byte[65536]; // 64Kb
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read,
           FileShare.Read, lineBuffer.Length))
    {
        int readBuffer = 0;
        while ((readBuffer = fs.Read(lineBuffer, 0, lineBuffer.Length)) > 0)
        {
            for (int i = 0; i < readBuffer; i++)
            {
                if (lineBuffer[i] == 0xD) // Carriage return + line feed
                    lineCount++;
            }
        }
    }
    

    Multithreading:

    Arguably the number of threads shouldn't affect the read speed, but real world benchmarking can sometimes prove otherwise. Try different buffer sizes and see if you get any gains at all with your setup. *This method contains a race condition. Use with caution.

    var tasks = new Task[Environment.ProcessorCount]; // 1 per core
    var fileLock = new ReaderWriterLockSlim();
    int bufferSize = 65536; // 64Kb
    
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read,
            FileShare.Read, bufferSize, FileOptions.RandomAccess))
    {
        for (int i = 0; i < tasks.Length; i++)
        {
            tasks[i] = Task.Factory.StartNew(() =>
                {
                    int readBuffer = 0;
                    var lineBuffer = new byte[bufferSize];
    
                    while ((fileLock.TryEnterReadLock(10) && 
                           (readBuffer = fs.Read(lineBuffer, 0, lineBuffer.Length)) > 0))
                    {
                        fileLock.ExitReadLock();
                        for (int n = 0; n < readBuffer; n++)
                            if (lineBuffer[n] == 0xD)
                                Interlocked.Increment(ref lineCount);
                    }
                });
        }
        Task.WaitAll(tasks);
    }