multithreading delphi volatile lock-free interlocked

Delphi [volatile] and InterlockedCompareExchange not reliable?

I wrote a simple lock-free node stack (Delp[hi XE4, Win7-64, 32-bit app) where I can have multiple 'stacks' and pop/push nodes between them from various threads simultaneously. It works 99.999% of the time but eventually glitch under a stress test using all CPU cores.

Stripped-down, it comes down to this (not real/compiled code): Nodes are :

type      POBNode          = ^TOBNode;
[volatile]TOBNode          = record
                    [volatile]Next   : POBNode;
                              Data   : Int64;
                             end;

Simplified stack :

type TOBStack  = class
                  private
                   [volatile]Head:POBNode; 
                  function Pop:POBNode;
                  procedure Push(NewNode:POBNode);
                 end;

procedure TOBStack.Push(NewNode:POBNode);
var zTmp : POBNode;
begin;
    repeat
     zTmp:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
     NewNode.Next:=zTmp;
     if InterlockedCompareExchangePointer(Head,NewNode,zTmp)=zTmp
        then break (*success*)
        else continue;
    until false;
end;

function TOBStack.Pop:POBNode;
begin;
 repeat
  Result:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
  if Result=nil
     the exit;
  NewHead:=Result.Next;
  if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
     then break (*Success*)
     else continue;(*Fail, try again*)
 until False;
end;

I have tried many variations on this but cannot get it to be stable. If I create a few threads that each have a stack and they all push/pop to/from a global stack, it eventually glitch, but not quickly. Only when I stress it for minutes on end from multiple threads, in tight loops.

I cannot have hidden bugs in my code, so I need more advice than to ask a specific question to get this running 100% error-free, 24/7. Does the code above look fine for a lock-free thread-safe stack? What else can I look at? This is impossible to debug normally as the errors occur at various places, telling me there is a pointer or RAM corruption happening somewhere. I also get duplicate entries, meaning that a node that was popped of one stack, then later returned to that same stack, is still on top of the old stack... Impossible to happen according to my algorithm? This lead me to believe it's possible to violate Delphi/Windows InterlockedCompareExchange Methods or there is some hidden knowledge I have yet to be revealed. :) (I also tried TInterlocked)

I have made a complete test case that can be copied from ftp.true.co.za. In there I run 8 threads doing 400 000 push/pops each and it usually crashes (safely due to checks/raised exceptions) after a few cycles of these tests, sometimes many many test cycles complete before one suddenly crash.

Any advice would be appreciated.

Regards Anton E

Solution

At first I was skeptical of this being an ABA problem as indicated by gabr. It seemed to me that: if one thread looks at the current Head, and another thread pushes then pops; you're happy to still operate on the same Head in the same way.

However, consider this from your Pop method:

NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result

If the thread is swapped out after the first line.
A value for NewHead is stored in a local variable.
Then another thread successfully pops the node this thread was targetting.
It also manages to push the same node back, but with a different value for Next before the first thread resumes.
The second line will pass the comparand check allowing head to receive the NewHead value from the local variable.
However, the current value for NewHead is incorrect, thereby corrupting your stack.

There's a subtle variation on this problem not even covered by your test app. This problem isn't checked in your test app because you aren't destroying any nodes until the end of your test.

Look at current head
Another thread pops some nodes.
The nodes are destroyed, and new nodes created and pushed.
By the time your "looking thread" is active again, it could be looking at an entirely different node that is coincidentally at the same address.
If you're popping, you might assign a garbage pointer to Head.

Apart form the above...
Looking at your test app there's also some really dodgy code. E.g.

You generate a "random number": J:=GetTickCount and 7;(*Get a 'random' number 0..7*).

Do you realise how fast computers are?
Do you realise that GetTickCount will generate reams of duplicates in a tight loop?
I.e. the numbers you generate will be nothing like random.
And when comments don't agree with code, my spidey-sense kicks in.

You're allocating memory of a hard-coded size: GetMem(zTmp,12);(*Allocate new node*).

Why aren't you using SizeOf?
You're using a multi-platform compiler.... The size of that structure can vary.
There is absolutely zero reason to hard-code the size of the structure.

Right now, given these two examples, I wouldn't be entirely confident that there isn't also an error in your test code.