I have the following code with which I am trying to understand peterson's solution. When I run this implementation for small values of loop till 9999 the output is correctly displayed as 0, but when I test this with higher loop values like 9999999, I get values close to 0, but not 0, is it possible that both increment and decrement thread could be executing in the (*(a))++;
section ? Why is the below implementation not working ? is there any bug in my program ?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#define LOOP 9999999
int interest[2] = {0,0};
int turn = 0;
int loops[2] = {LOOP,LOOP};
void increment(int *a) {
printf("Incrementing %p\n",a);
for(int i=0;i<LOOP;i++) {
interest[0] = 1;
turn = 1;
while(interest[1] == 1 && turn == 1);
(*(a))++;
loops[0]--;
interest[0] = 0;
}
}
void decrement(int *a) {
printf("Decrementing %p\n",a);
for(int i=0;i<LOOP;i++) {
interest[1] = 1;
turn = 0;
while(interest[0] == 1 && turn == 0);
(*(a))--;
loops[1]--;
interest[1] = 0;
}
}
void print_val(int *a) {
while(1) {
getchar();
printf("value at mem %d\niInc(%d) iDec(%d)\n",*a,loops[0],loops[1]);
}
}
int main() {
pthread_t t1, t2, t3;
int *mem = malloc(sizeof(int));
pthread_create(&t1, NULL, (void*)decrement, (void*)mem);
pthread_create(&t2, NULL, (void*)increment, (void*)mem);
pthread_create(&t3, NULL, (void*)print_val, (void*)mem);
pthread_join(t1,NULL);
pthread_join(t2,NULL);
printf("operation complete\n");
pthread_join(t3,NULL);
return 0;
}
output:
$:~/Arena/sem $ gcc op.c -pthread && ./a.out
Incrementing 0xd16010
Decrementing 0xd16010
operation complete
value at mem -2
iInc(0) iDec(0)
^Z
[13]+ Stopped ./a.out
$:~/Arena/sem $ gcc op.c -pthread && ./a.out
Decrementing 0x2432010
Incrementing 0x2432010
operation complete
value at mem 16
iInc(0) iDec(0)
^Z
[14]+ Stopped ./a.out
meow:~/Arena/sem $
Most of the answers are suggesting that there could be a some reordering done by the compiler, I have added the assembly dumps, can someone help me understand how both these processes end up inside the critical section ?
for(int i=0;i<LOOP;i++) {
25: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0
2c: eb 50 jmp 7e <increment+0x7e>
interest[0] = 1;
turn = 1;
2e: c7 05 00 00 00 00 01 mov DWORD PTR ds:0x0,0x1
35: 00 00 00
while(interest[1] == 1 && turn == 1);
38: c7 05 00 00 00 00 01 mov DWORD PTR ds:0x0,0x1
3f: 00 00 00
//while(turn == 1 && interest[1] == 1);
42: 90 nop
43: a1 04 00 00 00 mov eax,ds:0x4
48: 83 f8 01 cmp eax,0x1
4b: 75 0a jne 57 <increment+0x57>
4d: a1 00 00 00 00 mov eax,ds:0x0
52: 83 f8 01 cmp eax,0x1
55: 74 ec je 43 <increment+0x43>
(*(a->location))++;
57: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
5a: 8b 00 mov eax,DWORD PTR [eax]
5c: 8b 10 mov edx,DWORD PTR [eax]
5e: 83 c2 01 add edx,0x1
61: 89 10 mov DWORD PTR [eax],edx
loops[0]--;
63: a1 00 00 00 00 mov eax,ds:0x0
68: 83 e8 01 sub eax,0x1
6b: a3 00 00 00 00 mov ds:0x0,eax
interest[0] = 0;
70: c7 05 00 00 00 00 00 mov DWORD PTR ds:0x0,0x0
77: 00 00 00
for(int i=0;i<LOOP;i++) {
8f: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
92: 8b 50 04 mov edx,DWORD PTR [eax+0x4]
95: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
98: 8b 00 mov eax,DWORD PTR [eax]
9a: 89 54 24 08 mov DWORD PTR [esp+0x8],edx
9e: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
a2: c7 04 24 28 00 00 00 mov DWORD PTR [esp],0x28
a9: e8 fc ff ff ff call aa <decrement+0x21>
interest[1] = 1;
ae: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0
b5: eb 4f jmp 106 <decrement+0x7d>
turn = 0;
while(interest[0] == 1 && turn == 0);
b7: c7 05 04 00 00 00 01 mov DWORD PTR ds:0x4,0x1
be: 00 00 00
//while(turn == 0 && interest[0] == 1);
c1: c7 05 00 00 00 00 00 mov DWORD PTR ds:0x0,0x0
c8: 00 00 00
(*(a->location))--;
cb: 90 nop
cc: a1 00 00 00 00 mov eax,ds:0x0
d1: 83 f8 01 cmp eax,0x1
d4: 75 09 jne df <decrement+0x56>
d6: a1 00 00 00 00 mov eax,ds:0x0
db: 85 c0 test eax,eax
dd: 74 ed je cc <decrement+0x43>
loops[1]--;
df: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
e2: 8b 00 mov eax,DWORD PTR [eax]
e4: 8b 10 mov edx,DWORD PTR [eax]
e6: 83 ea 01 sub edx,0x1
e9: 89 10 mov DWORD PTR [eax],edx
interest[1] = 0;
eb: a1 04 00 00 00 mov eax,ds:0x4
f0: 83 e8 01 sub eax,0x1
f3: a3 04 00 00 00 mov ds:0x4,eax
You appear to be using pthreads, which is part of the POSIX standard. POSIX doesn't allow you to "roll your own" synchronisation primitive like that - it has this to say in 4.11 Memory Synchronization:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.
The practical upshot of this is that the compiler is allowed make transformations and perform re-orderings that may break your assumptions.
For example, the compiler I have optimises the while()
loops down to endless loops, since it can see that turn
is cannot legally be modified between setting it and testing it in the loop, as no POSIX synchronisation functions are called. That's obviously not happening to you, but other similar issues are possible.
In this case, it's probably not compiler optimisation that's biting you, but failure to respect the CPU memory model. This article describes how Peterson locks require a memory fence to be correct on multiprocessor x86. (The implementation of the POSIX synchronisation primitives will include the necessary memory fences, but your code does not).
In terms of your code, the load of interest[1]
can be reordered by x86 prior to the write to interest[0]
, and the load of interest[0]
can likewise be reordered before the write to interest[1]
. This means that both threads can see interest[0] == 0
and interest[1] == 0
at the same time, and both enter the critical section.