Search code examples
delphirecordtlistdelphi-10.1-berlin

Long delay when looping through a TList of big records


I use Delphi 10.1 Berlin in Windows 10.

I have two records of different sizes. I wrote code to loop through two TList<T> of these records to test elapsed times. Looping through the list of the larger record runs much slower.

Can anyone explain the reason, and provide a solution to make the loop run faster?

type
  tTestRecord1 = record
    Field1: array[0..4] of Integer;
    Field2: array[0..4] of Extended;
    Field3: string;
  end;

  tTestRecord2 = record
    Field1: array[0..4999] of Integer;
    Field2: array[0..4999] of Extended;
    Field3: string;
  end;

procedure TForm1.Button1Click(Sender: TObject);
var
  _List: TList<tTestRecord1>;
  _Record: tTestRecord1;
  _Time: TTime;
  i: Integer;
begin
  _List := TList<tTestRecord1>.Create;

  for i := 0 to 4999 do
  begin
    _List.Add(_Record);
  end;

  _Time := Time;

  for i := 0 to 4999 do
  begin
    if _List[i].Field3 = 'abcde' then
    begin
      Break;
    end;
  end;

  Button1.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.000

  _List.Free;
end;

procedure TForm1.Button2Click(Sender: TObject);
var
  _List: TList<tTestRecord2>;
  _Record: tTestRecord2;
  _Time: TTime;
  i: Integer;
begin
  _List := TList<tTestRecord2>.Create;

  for i := 0 to 4999 do
  begin
    _List.Add(_Record);
  end;

  _Time := Time;

  for i := 0 to 4999 do
  begin
    if _List[i].Field3 = 'abcde' then
    begin
      Break;
    end;
  end;

  Button2.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.045

  _List.Free;
end;

Solution

  • First of all, I want to consider the entire code, even the code that populates the list which I do realise you have not timed. Because the second record is larger in size more memory needs to be copied when you make an assignment of that record type. Further when you read from the list the larger record is less cache friendly than the smaller record which impacts performance. This latter effect is likely less significant than the former.

    Related to this is that as you add items the list's internal array of records has to be resized. Sometimes the resizing leads to a reallocation that cannot be performed in-place. When that happens a new block of memory is allocated and the previous content is copied to this new block. That copy is clearly ore expensive for the larger record. You can mitigate this by allocating the array once up front if you know it's length. The list Capacity is the mechanism to use. Of course, not always will you know the length ahead of time.

    Your program does very little beyond memory allocation and memory access. Hence the performance of these memory operations dominates.

    Now, your timing is only of the code that reads from the lists. So the memory copy performance difference on population is not part of the benchmarking that you performed. Your timing differences are mainly down to excessive memory copy when reading, as I will explain below.

    Consider this code:

    if _List[i].Field3 = 'abcde' then
    

    Because _List[i] is a record, a value type, the entire record is copied to an implicit hidden local variable. The code is actually equivalent to:

    var
      tmp: tTestRecord2;
    ...
    tmp := _List[i]; // copy of entire record
    if tmp.Field3 = 'abcde' then
    

    There are a few ways to avoid this copy:

    1. Change the underlying type to be a reference type. This changes the memory management requirements. And you may have good reason to want to use a value type.
    2. Use a container class that can return the address of an item rather than a copy of an item.
    3. Switch from TList<T> to dynamic array TArray<T>. That simple change will allow the compiler to access individual fields directly without copying entire records.
    4. Use the TList<T>.List to obtain access to the list object's underlying array holding the data. That would have the same effect as the previous item.

    Item 4 above is the simplest change you could make to see a large difference. You would replace

    if _List[i].Field3 = 'abcde' then
    

    with

    if _List.List[i].Field3 = 'abcde' then
    

    and that should yield a very significant change in performance.

    Consider this program:

    {$APPTYPE CONSOLE}
    
    uses
      System.Diagnostics,
      System.Generics.Collections;
    
    type
      tTestRecord2 = record
        Field1: array[0..4999] of Integer;
        Field2: array[0..4999] of Extended;
        Field3: string;
      end;
    
    procedure Main;
    const
      N = 100000;
    var
      i: Integer;
      Stopwatch: TStopwatch;
      List: TList<tTestRecord2>;
      Rec: tTestRecord2;
    begin
      List := TList<tTestRecord2>.Create;
      List.Capacity := N;
    
      for i := 0 to N-1 do
      begin
        List.Add(Rec);
      end;
    
      Stopwatch := TStopwatch.StartNew;
      for i := 0 to N-1 do
      begin
        if List[i].Field3 = 'abcde' then
        begin
          Break;
        end;
      end;
      Writeln(Stopwatch.ElapsedMilliseconds);
    end;
    
    begin
      Main;
      Readln;
    end.
    

    I had to compile it for 64 bit to avoid an out of memory condition. The output on my machine is around 700. Change List[i].Field3 to List.List[i].Field3 and the output is in single figures. The timing is rather crude, but I think this demonstrates the point.


    The issue of the large record not being cache friendly remains. That is more complicated to deal with and would require a detailed analysis of how the real world code operated on its data.


    As an aside, if you care about performance then you won't use Extended. Because it has size 10, not a power of two, memory access is frequently mis-aligned. Use Double or Real which is an alias to Double.