I'm implementing my own version of malloc, which is very similar to the glibc malloc, as in it supports multithreading by creating arenas, which is a memory area that a thread can work on without the risk of competing with another thread.
My data structure are as follow:
typedef struct s_arena {
pthread_mutex_t mutex;
t_pool *main_pool;
} t_arena;
typedef struct s_arena_data {
_Atomic int arena_count;
t_arena arenas[M_ARENA_MAX];
} t_arena_data;
t_arena_data is a global variable that contains the number of arenas that have been created, starting at 0 for the first call and capping at M_ARENA_MAX (which I currently define at 8), and an array containing all my arenas.
An arena only contains a mutex, which is initialized with pthread_mutex_init(), and a pointer to a pool of memory. The pool of memory is unimportant for this topic as the race condition occurs before reaching it.
How my program works: as each thread enters the malloc, it tries to pthread_try_lock the mutex of the first arena. If it does, all is well and it proceeds to the allocation which I haven't describe here. If it doesn't, several things can happen.
If the next entry in the array is empty and M_ARENA_MAX isn't reached, a new mutex will be locked in order to create a new arena and add it to the array. The mutex is global, meaning no two threads can be creating an arena simultaneously.
If that mutex is lock, then the thread will loop back to arena[0] and keep searching for an open mutex.
Now, I'm pretty sure the race condition happens because of the variable arena_count. I've observed thanks to the debug printf statements that anytime the function segfaults, M_ARENA_MAX hasn't been reached. If it has, the program will not crash. So I am suspecting that one thread might be reading the value of arena_count just before an other thread is incrementing it, and by the time it finishes reading it, the thread that incremented it releases the new_arena_mutex and the first thread goes in creating an arena with a wrong index.
It's my first multithreaded program so I apologize if my explanation or the code isn't clear, but I've been spending the last 4 hours on this issue and while I think I narrowed down the problem, I really don't know how to solve it.
Here is the part of the code that is faulty:
current_arena = &arena_data.arenas[0];
int arena_index = 0;
while (pthread_mutex_trylock(¤t_arena->mutex) != 0) {
printf("THREAD %p READS[0] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
if (arena_index == arena_data.arena_count - 1) {
printf("THREAD %p READS[1] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
if (pthread_mutex_trylock(&new_arena_mutex) != 0 || arena_data.arena_count == M_ARENA_MAX) {
current_arena = &arena_data.arenas[(arena_index = 0)];
continue;
}
creator = true;
break;
}
current_arena = &arena_data.arenas[arena_index++];
}
/* All arenas are occupied by other threads but M_ARENA_MAX isn't reached. Let's just create a new one. */
if (creator == true) {
printf("THREAD %p READS[2] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
current_pool = create_new_pool(MAIN_POOL, chunk_type, size, pagesize, &new_arena_mutex);
if (current_pool == MAP_FAILED) return NULL;
++arena_data.arena_count;
arena_data.arenas[arena_index + 1] = (t_arena){ .main_pool = current_pool };
pthread_mutex_init(&arena_data.arenas[arena_index + 1].mutex, NULL);
pthread_mutex_lock(&arena_data.arenas[arena_index + 1].mutex);
pthread_mutex_unlock(&new_arena_mutex);
return user_area((t_alloc_chunk *)current_pool->chunk, size, &arena_data.arenas[arena_index + 1].mutex);
}
Here is one of the printf statements, and one that comforts my theory that there is a race condition:
THREAD 0x7f9c3b216700 READS[1] ARENA COUNT AT 4
THREAD 0x7f9c3b216700 READS[2] ARENA COUNT AT 5
The value should be equal but it's not.
I can spot three issue in your code.
This is the race condition you describe in your question:
So I am suspecting that one thread might be reading the value of arena_count just before an other thread is incrementing it, and by the time it finishes reading it, the thread that incremented it releases the new_arena_mutex and the first thread goes in creating an arena with a wrong index.
Yes, that can happen. The load from arena_data.arena_count
happens atomically, but threads may not generally assume that the value is (still) correct. The modified version in your answer does not fix the problem.
In order to fix that, the following guarantee might be helpful: Any store to arena_data.arena_count
happens while holding the new_arena_mutex
. As a result, the thread that holds the mutex can safely load arena_data.arena_count
(while holding the mutex, of course) and it can be sure that its value does not change until it unlocks the mutex. Let me try to explain by changing and commenting your updated code:
while (pthread_mutex_trylock(¤t_arena->mutex) != 0) {
if (arena_index == arena_data.arena_count - 1) {
// This thread checks the condition above _without_ holding the
// `new_arena_mutex`. Another thread may hold the mutex (and hence it
// may increment `arena_count`).
if (pthread_mutex_trylock(&new_arena_mutex) == 0) {
// Now, this thread can assume that no other thread writes to
// `arena_data.arena_count`. However, the condition
//
// arena_index == arena_data.arena_count - 1
//
// may no longer be true (because it had been checked before locking).
if (arena_data.arena_count < M_ARENA_MAX) {
// This thread may create a new arena at index
// `arena_data.arena_count`. That is safe because this thread holds
// the `new_arena_mutex` (preventing other threads from modifying
// `arena_count`.
//
// However, it is possible that `arena_index` is not at the position
// of the most recently created arena (checked _before_ locking). Let
// us just assume that all the recently created arenas are still
// locked. Hence we just skip the check and directly jump to the most
// recently created arena (as if we failed locking).
arena_index = arena_data.arena_count - 1;
current_arena = &arena_data.arenas[arena_index];
++arena_data.arena_count;
assert(
arena_index + 1 == arena_data.arena_count &&
"... and this thread is holding the mutex, so it stays true."
);
creator = true;
break;
} else {
pthread_mutex_unlock(&new_arena_mutex);
}
In my opinion, the code becomes more readable if you extract those actions into functions such as
// both functions return `arena_index` or `-1`
int try_find_and_lock_arena();
int try_create_and_lock_arena();
The post-increment operator in the following line looks wrong to me:
current_arena = &arena_data.arenas[arena_index++];// post-increment
// now, `&arena_data.arenas[arena_index]` is one beyond `current_arena`.
Written in two lines, it may be easier to reason about the behavior:
assert(
current_arena == &arena_data.arenas[arena_index] &&
"this is an invariant I expect to hold"
);
current_arena = &arena_data.arenas[arena_index];// this is a no-op...
arena_index++;// ... and now, they are out of sync
assert(
current_arena == &arena_data.arenas[arena_index] &&
"now, the invariant is broken (and this assert should fire)"
);
I find it hard to match the mutex' lock/unlock operations for all possible paths because they happen in different scopes.
// [new_arena_mutex is locked]
current_pool = create_new_pool(/* ... */, &new_arena_mutex);
if (current_pool == MAP_FAILED) return NULL;// error-path return
// `create_new_pool` unlocks iff it returns `MAP_FAILED`...
/* ... */
pthread_mutex_unlock(&new_arena_mutex);
// ... otherwise, the mutex is unlocked here
return user_area(/* ... */);