c concurrency operating-system pthreads concurrent-programming

Peterson's solution implementation not working in C

I have the following code with which I am trying to understand peterson's solution. When I run this implementation for small values of loop till 9999 the output is correctly displayed as 0, but when I test this with higher loop values like 9999999, I get values close to 0, but not 0, is it possible that both increment and decrement thread could be executing in the (*(a))++; section ? Why is the below implementation not working ? is there any bug in my program ?

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>

#define LOOP 9999999

int interest[2] = {0,0};
int turn = 0;
int loops[2] = {LOOP,LOOP};

void increment(int *a) {
  printf("Incrementing %p\n",a);
  for(int i=0;i<LOOP;i++) {
    interest[0] = 1;
    turn = 1;
    while(interest[1] == 1 && turn == 1);
    (*(a))++;
    loops[0]--;
    interest[0] = 0;
  }
}

void decrement(int *a) {
  printf("Decrementing %p\n",a);
  for(int i=0;i<LOOP;i++) {
    interest[1] = 1;
    turn = 0;
    while(interest[0] == 1 && turn == 0);
    (*(a))--;
    loops[1]--;
    interest[1] = 0;
  }
}

void print_val(int *a) {
  while(1) {
    getchar();
    printf("value at mem %d\niInc(%d) iDec(%d)\n",*a,loops[0],loops[1]);
  }
}

int main() {

  pthread_t t1, t2, t3;
  int *mem = malloc(sizeof(int));

  pthread_create(&t1, NULL, (void*)decrement, (void*)mem);
  pthread_create(&t2, NULL, (void*)increment, (void*)mem);
  pthread_create(&t3, NULL, (void*)print_val, (void*)mem);

  pthread_join(t1,NULL);  
  pthread_join(t2,NULL);  
  printf("operation complete\n");
  pthread_join(t3,NULL);  

  return 0;
}

output:

$:~/Arena/sem $ gcc op.c -pthread && ./a.out
Incrementing 0xd16010
Decrementing 0xd16010
operation complete

value at mem -2
iInc(0) iDec(0)
^Z
[13]+  Stopped                 ./a.out
$:~/Arena/sem $ gcc op.c -pthread && ./a.out
Decrementing 0x2432010
Incrementing 0x2432010
operation complete

value at mem 16
iInc(0) iDec(0)
^Z
[14]+  Stopped                 ./a.out
meow:~/Arena/sem $

Edit:

I already tried Wrong implementation of Peterson's algorithm? and adding volatile did not help, I also am not using ++ operation like said in the previous thread.
Implementation of Peterson's solution doesn't work properly and Peterson algorithm in Java? are not in C and my thread is no a dupe.

Looking at the answers:

Most of the answers are suggesting that there could be a some reordering done by the compiler, I have added the assembly dumps, can someone help me understand how both these processes end up inside the critical section ?

Increment function

 for(int i=0;i<LOOP;i++) {
  25:   c7 45 f4 00 00 00 00    mov    DWORD PTR [ebp-0xc],0x0
  2c:   eb 50                            jmp    7e <increment+0x7e>
    interest[0] = 1;
    turn = 1;
  2e:   c7 05 00 00 00 00 01   mov    DWORD PTR ds:0x0,0x1
  35:   00 00 00 
    while(interest[1] == 1 && turn == 1);
  38:   c7 05 00 00 00 00 01   mov    DWORD PTR ds:0x0,0x1
  3f:   00 00 00 
    //while(turn == 1 && interest[1] == 1);
  42:   90                                 nop
  43:   a1 04 00 00 00             mov    eax,ds:0x4
  48:   83 f8 01                        cmp    eax,0x1
  4b:   75 0a                            jne    57 <increment+0x57>
  4d:   a1 00 00 00 00             mov    eax,ds:0x0
  52:   83 f8 01                        cmp    eax,0x1
  55:   74 ec                             je     43 <increment+0x43>
    (*(a->location))++;
  57:   8b 45 08                        mov    eax,DWORD PTR [ebp+0x8]
  5a:   8b 00                             mov    eax,DWORD PTR [eax]
  5c:   8b 10                             mov    edx,DWORD PTR [eax]
  5e:   83 c2 01                        add    edx,0x1
  61:   89 10                             mov    DWORD PTR [eax],edx
    loops[0]--;
  63:   a1 00 00 00 00              mov    eax,ds:0x0
  68:   83 e8 01                        sub    eax,0x1
  6b:   a3 00 00 00 00              mov    ds:0x0,eax
    interest[0] = 0;
  70:   c7 05 00 00 00 00 00    mov    DWORD PTR ds:0x0,0x0
  77:   00 00 00

Decrement function

for(int i=0;i<LOOP;i++) {
  8f:   8b 45 08                       mov    eax,DWORD PTR [ebp+0x8]
  92:   8b 50 04                      mov    edx,DWORD PTR [eax+0x4]
  95:   8b 45 08                      mov    eax,DWORD PTR [ebp+0x8]
  98:   8b 00                           mov    eax,DWORD PTR [eax]
  9a:   89 54 24 08                 mov    DWORD PTR [esp+0x8],edx
  9e:   89 44 24 04                 mov    DWORD PTR [esp+0x4],eax
  a2:   c7 04 24 28 00 00 00  mov    DWORD PTR [esp],0x28
  a9:   e8 fc ff ff ff                 call   aa <decrement+0x21>
    interest[1] = 1;
  ae:   c7 45 f4 00 00 00 00   mov    DWORD PTR [ebp-0xc],0x0
  b5:   eb 4f                            jmp    106 <decrement+0x7d>
    turn = 0;
    while(interest[0] == 1 && turn == 0);
  b7:   c7 05 04 00 00 00 01  mov    DWORD PTR ds:0x4,0x1
  be:   00 00 00 
    //while(turn == 0 && interest[0] == 1);
  c1:   c7 05 00 00 00 00 00  mov    DWORD PTR ds:0x0,0x0
  c8:   00 00 00 
    (*(a->location))--;
  cb:   90                                nop
  cc:   a1 00 00 00 00            mov    eax,ds:0x0
  d1:   83 f8 01                      cmp    eax,0x1
  d4:   75 09                           jne    df <decrement+0x56>
  d6:   a1 00 00 00 00            mov    eax,ds:0x0
  db:   85 c0                           test   eax,eax
  dd:   74 ed                           je     cc <decrement+0x43>
    loops[1]--;
  df:   8b 45 08                       mov    eax,DWORD PTR [ebp+0x8]
  e2:   8b 00                            mov    eax,DWORD PTR [eax]
  e4:   8b 10                            mov    edx,DWORD PTR [eax]
  e6:   83 ea 01                       sub    edx,0x1
  e9:   89 10                            mov    DWORD PTR [eax],edx
    interest[1] = 0;
  eb:   a1 04 00 00 00             mov    eax,ds:0x4
  f0:   83 e8 01                        sub    eax,0x1
  f3:   a3 04 00 00 00              mov    ds:0x4,eax

Solution

You appear to be using pthreads, which is part of the POSIX standard. POSIX doesn't allow you to "roll your own" synchronisation primitive like that - it has this to say in 4.11 Memory Synchronization:

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.

The practical upshot of this is that the compiler is allowed make transformations and perform re-orderings that may break your assumptions.

For example, the compiler I have optimises the while() loops down to endless loops, since it can see that turn is cannot legally be modified between setting it and testing it in the loop, as no POSIX synchronisation functions are called. That's obviously not happening to you, but other similar issues are possible.

In this case, it's probably not compiler optimisation that's biting you, but failure to respect the CPU memory model. This article describes how Peterson locks require a memory fence to be correct on multiprocessor x86. (The implementation of the POSIX synchronisation primitives will include the necessary memory fences, but your code does not).

In terms of your code, the load of interest[1] can be reordered by x86 prior to the write to interest[0], and the load of interest[0] can likewise be reordered before the write to interest[1]. This means that both threads can see interest[0] == 0 and interest[1] == 0 at the same time, and both enter the critical section.