Search code examples
c#.netlistperformancebyte

Performance Issue with List<byte>


I have a file (~30MB) which I read with File.ReadAllBytes() This File has a lot of structed data with different length (no terminator in between). So I have to check the length of the first data, cut it out and continue with the next one and so on and on. Therefore splitting it up in different Tasks is not possible.

Is there any other way to speed this up? (currently taking about 20minutes)

    List<Record> Records = new List<Record>  
    internal static void Import(List<byte> filedata)
    {
        var task = Task.Run(() =>
        {
            while (filedata.Count > 0)
            {
                Record record = new Record()
                filedata = record.GetData(filedata);
                Records.Add(record)
            }
        });
    }
      
    //inside class "Record"
    internal List<byte> GetData(List<byte> filedata)
    {
        this.length = BitConverter.ToUInt32(new byte[4] { filedata[8], filedata[9], filedata[10], filedata[11] }, 0);
        this.data = new byte[this.length + 1];
        filedata.CopyTo(0, this.data, 0, this.length);
        filedata.RemoveRange(0, 16 + this.length);
        return filedata;
    }

Solution

  • I think it's less about how you can make it better, and more that you couldn't possibly make it worse. There are some serious algorithmic and framework misunderstandings going on here, so much so that I'd recommend re-education for the person that wrote this.

    On to specifics:

    • don't allocate an array just to extract a number out of your byte buffer, you could just write it out by hand using 4th grade math, or use a built-in function like MemoryMarshal.Cast to re-interpret the bytes in place.
    • you shouldn't return an array (a list, in fact!!) of data from your main array, you should just extract what you actually need and return that in a struct. If you really aren't able to do that for whatever reason, you have spans and ArraySegment to avoid allocating a new list and copying data around for no reason.
    • filedata.RemoveRange(0, 16 + this.length); now the person is just being absolutely lazy. Removing items from the beginning of an array (a list, in fact, again) is a linear operation, just keep an index of the last processed byte and move it forward as you're processing your data.