Bank account transfer with threads

I have to do some bank account transfers with threads and benchmarking the different results. I think that the time of syncronized solution with general lock has to be worse than one lock per account solution.

Here is my implementation with general lock:

pthread_mutex_t general_mutex;

typedef struct {
    int id;
    double balance;
} Account;

int NThreads = 400; /*Threads number*/

#define N 20             /*Accounts number*/
Account accounts[N];
void transfer(double money, Account* origin, Account* destini) {

    pthread_mutex_lock(&general_mutex); //Init general lock.

    bool wasPosible = withdraw(origin, money);
    if (wasPosible ) deposit(destini, money);

    pthread_mutex_unlock(&general_mutex); //End general lock.
}

Here is the implementation with individual lock per account:

typedef struct {
    int id;
    double balance;
    pthread_mutex_t mutex; // lock to use/modify vars
} Account;

int NThreads = 400; /*Threads number*/

#define N 20             /*Accounts number*/
Account accounts[N];
void transfer(double money, Account* origin, Account* destini) {

    if (from->id < to->id) {
         pthread_mutex_lock(&(origin->mutex));
         pthread_mutex_lock(&(destini->mutex));
    } else {
         pthread_mutex_lock(&(destini->mutex));
         pthread_mutex_lock(&(origin->mutex));
    }

    bool wasPosible = withdraw(origin, money);
    if (wasPosible ) deposit(destini, amount);

    pthread_mutex_unlock(&(origin->mutex));
    pthread_mutex_unlock(&(destini->mutex));
}

Why the general lock solution expends less time than the second one?

Thank you

Solution

A locking operation is not free. In the second example you do twice as much locking/unlocking operation than in the first one. The other operations seem to be simple memory accesses, so they should not last very long.

My opinion is that in your system, you spend more time in the locks than in actual processing, so increasing the number of locks is not relevant. It could be different if transfer used slow io like disk or network.

BTW, as you were said in comment, 400 threads is probably again less efficient than a much smaller number. A rule of thumb is as much as the number of core that you will use, increased by a variable factor if the processing spends time in waiting for io - if no io never more than the available cores. And the upper limit is that the memory used by all the threads must not exceed the memory you want to use, and that the overhead of starting and synchronizing the threads stays much lower that the total processing time.