Search code examples
c++multithreadingperformancestdthreadstdmutex

c++ threads safety and time efficiency: why does thread with mutex check sometimes works faster than without it?


I'm beginner in threads usage in c++. I've read basics about std::thread and mutex, and it seems I understand the purpose of using mutexes.

I decided to check if threads are really so dangerous without mutexes (Well I believe books but prefer to see it with my own eyes). As a testcase of "what I shouldn't do in future" I created 2 versions of the same concept: there are 2 threads, one of them increments a number several times (NUMBER_OF_ITERATIONS), another one decrements the same number the same number of times, so we expect to see the same number after the code is executed as before it. The code is attached.

At first I run 2 threads which do it in unsafe manner - without any mutexes, just to see what can happen. And after this part is finished I run 2 threads which do the same thing but in safe manner (with mutexes).

Expected results: without mutexes a result can differ from initial value, because data could be corrupted if two threads works with it simultaneously. Especially it's usual for huge NUMBER_OF_ITERATIONS - because the probability to corrupt data is higher. So this result I can understand.

Also I measured time spent by both "safe" and "unsafe" parts. For huge number of iterations the safe part spends much more time, than unsafe one, as I expected: there is some time spent for mutex check. But for small numbers of iterations (400, 4000) the safe part execution time is less than unsafe time. Why is that possible? Is it something which operating system does? Or is there some optimization by compiler which I'm not aware of? I spent some time thinking about it and decided to ask here.

I use windows and MSVS12 compiler.

Thus the question is: why the safe part execution could be faster than unsafe part one (for small NUMBER_OF_ITERATIONS < 1000*n)? Another one: why is it related to NUMBER_OF_ITERATIONS: for smaller ones (4000) "safe" part with mutexes is faster, but for huge ones (400000) the "safe" part is slower?

main.cpp

#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <windows.h>
//
///change number of iterations for different results
const long long NUMBER_OF_ITERATIONS = 400;
//
/// time check counter
class Counter{
    double PCFreq_ = 0.0;
    __int64 CounterStart_ = 0;
public:
    Counter(){
        LARGE_INTEGER li;
        if(!QueryPerformanceFrequency(&li))
            std::cerr << "QueryPerformanceFrequency failed!\n";

        PCFreq_ = double(li.QuadPart)/1000.0;

        QueryPerformanceCounter(&li);
        CounterStart_ = li.QuadPart;
    }
    double GetCounter(){
        LARGE_INTEGER li;
        QueryPerformanceCounter(&li);
        return double(li.QuadPart-CounterStart_)/PCFreq_;
    }
};

/// "dangerous" functions for unsafe threads: increment and decrement number
void incr(long long* j){
    for (long long i = 0; i < NUMBER_OF_ITERATIONS; i++) (*j)++;
    std::cout << "incr finished" << std::endl;
}
void decr(long long* j){
    for (long long i = 0; i < NUMBER_OF_ITERATIONS; i++) (*j)--;
    std::cout << "decr finished" << std::endl;
}

///class for safe thread operations with incrment and decrement
template<typename T>
class Safe_number {
public:
    Safe_number(int i){number_ = T(i);}
    Safe_number(long long i){number_ = T(i);}
    bool inc(){
        if(m_.try_lock()){
            number_++;
            m_.unlock();
            return true;
        }
        else
            return false;
    }
    bool dec(){
        if(m_.try_lock()){
            number_--;
            m_.unlock();
            return true;
        }
        else
            return false;
    }
    T val(){return number_;}
private:
    T number_;
    std::mutex m_;
};

///
template<typename T>
void incr(Safe_number<T>* n){
    long long i = 0;
    while(i < NUMBER_OF_ITERATIONS){
        if (n->inc()) i++;
    }
    std::cout << "incr <T> finished" << std::endl;
}
///
template<typename T>
void decr(Safe_number<T>* n){
    long long i = 0;
    while(i < NUMBER_OF_ITERATIONS){
        if (n->dec()) i++;
    }
    std::cout << "decr <T> finished" << std::endl;
}

using namespace std;

// run increments and decrements of the same number
// in threads in "safe" and "unsafe" way
int main()
{
    //init numbers to 0
    long long number = 0;
    Safe_number<long long> sNum(number);

    Counter cnt;//init time counter
    //
    //run 2 unsafe threads for ++ and --
    std::thread t1(incr, &number);
    std::thread t2(decr, &number);
    t1.join();
    t2.join();
    //check time of execution of unsafe part
    double time1 = cnt.GetCounter();
    cout <<"finished first thr"  << endl;
    //
    // run 2 safe threads for ++ and --, now we expect final value 0
    std::thread t3(incr<long long>, &sNum);
    std::thread t4(decr<long long>, &sNum);
    t3.join();
    t4.join();
    //check time of execution of safe part
    double time2 = cnt.GetCounter() - time1;
    cout << "unsafe part, number = " << number << "  time1 = " << time1 << endl;
    cout << "safe part, Safe number = " << sNum.val() << "  time2 = " << time2 << endl << endl;

    return 0;
}

Solution

  • You should not draw conclusions about the speed of any given algorithm if the input size is very small. What defines "very small" can be kind of arbitrary, but on modern hardware, under usual conditions, "small" can refer to any collection size less than a few hundred thousand objects, and "large" can refer to any collection larger than that.

    Obviously, Your Milage May Vary.

    In this case, the overhead of constructing threads, which, while usually slow, can also be rather inconsistent and could be a larger factor in the speed of your code than what the actual algorithm is doing. It's possible that the compiler has some kind of powerful optimizations it can do on smaller input sizes (which it can definitely know about due to the input size being hard-coded into the code itself) that it cannot then perform on larger inputs.

    The broader point being that you should always prefer larger inputs when testing algorithm speed, and to also have the same program repeat its tests (preferably in random order!) to try to "smooth out" irregularities in the timings.