Search code examples
c++multithreadingc++14thread-sanitizer

data race with std::shared_timed_mutex::try_lock_until


I'm trying to write a small testcase to exercise std::shared_timed_mutex::try_lock_until. Documentation on cppreference.

Here is my code

#include <thread>
#include <iostream>
#include <chrono>
#include <shared_mutex>
#include <cassert>
 
std::shared_timed_mutex test_mutex;
int global;
 
void f()
{
    auto now=std::chrono::steady_clock::now();
    test_mutex.try_lock_until(now + std::chrono::seconds(100));
    //test_mutex.lock();
    --global;
    std::cout << "In lock, global=" << global << '\n';
    test_mutex.unlock();
}

void g()
{
    auto now=std::chrono::steady_clock::now();
    test_mutex.try_lock_shared_until(now + std::chrono::seconds(10));
    //test_mutex.lock_shared();
    std::cout << "In shared lock, global=" << global << '\n';
    test_mutex.unlock_shared();
}
 
int main()
{
    global = 1;
    test_mutex.lock_shared();
    std::thread t1(f);
    std::thread t2(g);
    test_mutex.unlock_shared();
    t1.join();
    t2.join();
    assert(global == 0);
}

What I'm expecting is

  1. main gets a read lock then starts f and g
  2. f tries to get an exclusive lock and blocks
  3. g gets a read lock, reads global then unlocks the read lock
  4. main unlocks the read lock
  5. f unblocks, write to global, unlockes and finishes
  6. f and g join 7 the assert is true and main ends

(2 and 3 could be in any order).

This seems to work OK on its own. Under gdb, if I put breakpoints on the read of global in g and the write in f, then run, it stops on the read, as I would expect.

Howver, if I compile with -fsanitize=tthread then I get a hazard

WARNING: ThreadSanitizer: data race (pid=6780)
  Read of size 4 at 0x000000407298 by thread T2:
    #0 g() /home/paulf/scratch/valgrind/drd/tests/try_lock_shared_until14.cpp:25 (try_lock_shared_until14+0x402484)
[trimmed]
    #6 execute_native_thread_routine ../../../../../libstdc++-v3/src/c++11/thread.cc:82 (libstdc++.so.6+0xd9c83)

  Previous write of size 4 at 0x000000407298 by thread T1:
    #0 f() /home/paulf/scratch/valgrind/drd/tests/try_lock_shared_until14.cpp:15 [triimed]
    #6 execute_native_thread_routine ../../../../../libstdc++-v3/src/c++11/thread.cc:82 (libstdc++.so.6+0xd9c83)

  Location is global 'global' of size 4 at 0x000000407298 (try_lock_shared_until14+0x000000407298)

Under gdb the tsan version does not block on the exclusive lock and reaches the write first.

I realize that my example is not good and I should check return values and not rely on timeouts.

Can anyone explain what tsan is changing? If I use the plain lock/lock_shared/unlock/unlock_shared functions then tsan no longer complains.

(Note that I can't use DRD or Helgrind for this - I'm writing the testcase for them and I know that they do not support this at the moment, at least not on the platform that I'm using, Fedora 34 / GCC 11.2.1 amd64).

Edit: Here is version 3, which works now. main waits on a cv got g() to finish, then releases the shared lock, then f() can get the exclusive lock.

#include <thread>
#include <iostream>
#include <chrono>
#include <shared_mutex>
#include <mutex>
#include <cassert>
#include <condition_variable>

std::shared_timed_mutex test_mutex;
std::mutex cv_mutex;
std::condition_variable cv;
int global;
bool reads_done = false;
 
void f()
{
    auto now=std::chrono::steady_clock::now();
    std::cout << "In lock, trying to get mutex\n";
    if (test_mutex.try_lock_until(now + std::chrono::seconds(3)))
    {
       --global;
       std::cout << "In lock, global=" << global << '\n';
       test_mutex.unlock();
    }
    else
    {
        std::cerr << "Lock failed\n";
    }
}

void g()
{
    auto now=std::chrono::steady_clock::now();
    std::cout << "In shared lock, trying to get mutex\n";
    if (test_mutex.try_lock_shared_until(now + std::chrono::seconds(2)))
    {
       std::cout << "In shared lock, global=" << global << '\n';
       test_mutex.unlock_shared();
    }
    else
    {
        std::cerr << "Lock shared failed\n";
    }
    std::unique_lock<std::mutex> lock(cv_mutex);
    reads_done = true;
    cv.notify_all();
}
 
int main()
{
    global = 1;
    test_mutex.lock_shared();
    std::thread t1(f);
    std::thread t2(g);
    {
       std::unique_lock<std::mutex> lock(cv_mutex);
       while (!reads_done)
       {
          cv.wait(lock);
       }
    }
    std::cout << "Main, reader thread done\n";
    test_mutex.unlock_shared();
    std::cout << "Main, no more shared locks\n";
    t1.join();
    t2.join();
    assert(global == 0);
}



Solution

  • This is also a valid scheduling scenario:

    1. main gets a read lock then starts f and g
    2. main releases read lock then joins
    3. f starts execution and locks for over 10 ms, e.g. due to preemption
    4. g starts execution and blocks for 10 ms
    5. g unblocks and reads the shared variable

    In 5., a data race happens, and therefore ThreadSanitizer is correct to point it out. The correction of this error would require to check for the return values of try_lock_shared_until and so on.