Search code examples
c#arraysbooleanfilestreammicrobenchmark

Is there a faster way to read data with a FileStream?


I'm trying to read data and store it in an array as fast as possible and the fastest method I found of doing so was this.

var filePath = "data.dat";
FileStream fs = new FileStream(filePath, FileMode.Open);
bool[] buffer = new bool[fs.Length];

TimeSpan[] times = new TimeSpan[500000];
Stopwatch sw = new Stopwatch();

for (int r = 0; r < 500000; r++)
{
    sw.Start();

    int stackable = 0;
    int counter = 0;

    while ((stackable = fs.ReadByte()) != -1)
    {
        buffer[counter] = (stackable == 1);
        counter++;
    }

    sw.Stop();
    Console.WriteLine($"Elapsed: {sw.Elapsed}ms");
    times[r] = sw.Elapsed;
    sw.Reset();
}

Console.WriteLine($"Longest iteration: {times.Max()}ms");

which manages to read and process about 9000 bytes in < 3ms. The idea is to check each byte to see if it's either 1 or 0 (true or false) and store that in an array.

So my question is, is there a faster way of achieving this? What are some things to keep in mind when trying to process data fast, is it to make sure you're working with smaller data types so you don't allocate unnecessary memory?

What the data looks like:

enter image description here

https://hatebin.com/dcldbvrbdm


Solution

  • Well, we are working with buffered IO so iterating by byte isn't that bad. But, reading data once (if you can) into a buffer is always faster - one IO. So below I used your code - had to add a seek(0) in the loop to reset the iteration.

    In the next block I read all the data in and iterate using the new .AsSpan<>() - which is the new fast way to iterate an array.

    using System;
    using System.Diagnostics;
    using System.IO;
    
    namespace test_con
    {
        class Program
        {
            static void Main(string[] args)
            {
                makedata();
                var filePath = "data.dat";
                var loop_cnt = 5000;
                using FileStream fs = new FileStream(filePath, FileMode.Open);
                bool[] buffer = new bool[fs.Length];
       
                Stopwatch sw = new Stopwatch();
                sw.Start();
    
                for (int r = 0; r < loop_cnt; r++)
                {
                    int stackable = 0;
                    int counter = 0;
                    while ((stackable = fs.ReadByte()) != -1)
                    {
                        buffer[counter] = (stackable == 1);
                        counter++;
                    }
                    fs.Seek(0, SeekOrigin.Begin);
                }
    
                Console.WriteLine($"avg iteration: {sw.Elapsed.TotalMilliseconds/loop_cnt}");
    
                var byte_buf = new byte[fs.Length];
                sw.Restart();
    
                for (int r = 0; r < loop_cnt; r++)
                {
                    fs.Seek(0, SeekOrigin.Begin);
                    fs.Read(byte_buf);
                    int counter = 0;
                    foreach(var b in byte_buf.AsSpan()) {
                        buffer[counter] = (b == 1);
                        counter++;
                    }
                }
    
                Console.WriteLine($"buf avg iteration: {sw.Elapsed.TotalMilliseconds / loop_cnt}");
            }
    
            static void makedata()
            {
                var filePath = "data.dat";
                if (!File.Exists(filePath))
                {
                    Random rnd = new Random();
    
                    using FileStream fs = new FileStream(filePath, FileMode.CreateNew);
                    for (int n = 0; n < 100000; n++)
                    {
                        if (rnd.Next() % 1 == 1)
                            fs.WriteByte(0);
                        else
                            fs.WriteByte(1);
                    }
                }
            }
        }
    }
    

    The output on my 2012 MacBook is:

    avg iteration: 1.01832286
    buf avg iteration: 0.6913623999999999
    

    So buffer iteration is only about 70% of the stream iteration.