Search code examples
delphithreadpoolaccess-violationomnithreadlibrary

Why does the following code using IOmniThreadPool cause access violations?


In our Delphi XE4 application we are using an OmniThreadPool with MaxExecuting=4 to improve the efficiency of a certain calculation. Unfortunately we are having trouble with intermittent access violations (see for example the following MadExcept bug report http://ec2-72-44-42-247.compute-1.amazonaws.com/BugReport.txt). I was able to construct the following example which demonstrates the problem. After running the following console application, an access violation in System.SyncObjs.TCriticalSection.Acquire usually occurs within a minute or so. Can anybody tell me what I am doing wrong in the following code, or show me another way of achieving the desired result?

program OmniPoolCrashTest;

{$APPTYPE CONSOLE}

uses
  Winapi.Windows, System.SysUtils,
  DSiWin32, GpLists,
  OtlSync, OtlThreadPool, OtlTaskControl, OtlComm, OtlTask;

const
  cTimeToWaitForException = 10 * 60 * 1000;  // program exits if no exception after 10 minutes
  MSG_CALLEE_FINISHED = 113; // our custom Omni message ID
  cMaxAllowedParallelCallees = 4; // enforced via thread pool
  cCalleeDuration = 10; // 10 miliseconds
  cCallerRepetitionInterval = 200; // 200 milliseconds
  cDefaultNumberOfCallers = 10; // 10 callers each issuing 1 call every 200 milliseconds

var
  gv_OmniThreadPool : IOmniThreadPool;

procedure OmniTaskProcedure_Callee(const task: IOmniTask);
begin
  Sleep(cCalleeDuration);
  task.Comm.Send(MSG_CALLEE_FINISHED);
end;

procedure PerformThreadPoolTest();
var
  OmniTaskControl : IOmniTaskControl;
begin
  OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
  WaitForSingleObject(OmniTaskControl.Comm.NewMessageEvent, INFINITE);
end;

procedure OmniTaskProcedure_Caller(const task: IOmniTask);
begin
  while not task.Terminated do begin
    PerformThreadPoolTest();
    Sleep(cCallerRepetitionInterval);
  end;
end;

var
  CallerTasks : TGpInterfaceList<IOmniTaskControl>;
  i : integer;
begin
  gv_OmniThreadPool := CreateThreadPool('CalleeThreadPool');
  gv_OmniThreadPool.MaxExecuting := cMaxAllowedParallelCallees;
  CallerTasks := TGpInterfaceList<IOmniTaskControl>.Create();
  for i := 1 to StrToIntDef(ParamStr(1), cDefaultNumberOfCallers) do begin
    CallerTasks.Add( CreateTask(OmniTaskProcedure_Caller).Run() );
  end;
  Sleep(cTimeToWaitForException);
  for i := 0 to CallerTasks.Count-1 do begin
    CallerTasks[i].Terminate();
  end;
  CallerTasks.Free();
end.

Solution

  • You have here an example of hard-to-find Task controller needs an owner problem. What happens is that the task controller sometimes gets destroyed before the task itself and that causes the task to access memory containing random data.

    Problematic scenario goes like this ([T] marks task, [C] marks task controller):

    • [T] sends the message
    • [C] receives the message and exits
    • [C] is destroyed
    • new task [T1] and controller [C1] are created
    • [T] tries to exit; during that it accesses the shared memory area which was managed by [C] but was then destroyed and overwritten by the data belonging to [C1] or [T1]

    In the Graymatter's workaround, OnTerminated creates an implicit owner for the task inside the OmniThreadLibrary which "solves" the problem.

    The correct way to wait on the task to complete is to call taskControler.WaitFor.

    procedure OmniTaskProcedure_Callee(const task: IOmniTask);
    begin
      Sleep(cCalleeDuration);
    end;
    
    procedure PerformThreadPoolTest();
    var
      OmniTaskControl : IOmniTaskControl;
    begin
      OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
      OmniTaskControl.WaitFor(INFINITE);
    end;
    

    I will look into replacing shared memory record with reference-counted solution which would prevent such problems (or at least make them easier to find).