Search code examples
c++multithreadingboostvalgrindboost-filesystem

Is boost::filesystem thread-safe?


Here is the following case I am observing on my Debian stable system:

% valgrind --tool=drd ./threads
==1368067== drd, a thread error detector
==1368067== Copyright (C) 2006-2020, and GNU GPL'd, by Bart Van Assche.
==1368067== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==1368067== Command: ./threads
==1368067==
==1368067== Thread 3:
==1368067== Conflicting load by thread 3 at 0x048cc208 size 8
==1368067==    at 0x48C1CB6: size (basic_string.h:1064)
==1368067==    by 0x48C1CB6: boost::filesystem::path::end() const (path.cpp:737)
==1368067==    by 0x48C2331: boost::filesystem::path::compare(boost::filesystem::path const&) const (path.cpp:178)
==1368067==    by 0x48C26D9: operator== (path.hpp:771)
==1368067==    by 0x48C26D9: boost::filesystem::path::extension() const (path.cpp:353)
==1368067==    by 0x48C284E: boost::filesystem::path::replace_extension(boost::filesystem::path const&) (path.cpp:238)
==1368067==    by 0x10B449: run(void*) (in /home/mathieu/Perso/PublicRep/cxx/boost/bin/threads)
==1368067==    by 0x484CD34: ??? (in /usr/libexec/valgrind/vgpreload_drd-amd64-linux.so)
==1368067==    by 0x4B90133: start_thread (pthread_create.c:442)
==1368067==    by 0x4C0FA3F: clone (clone.S:100)
==1368067== Allocation context: BSS section of /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.74.0
==1368067== Other segment start (thread 2)
==1368067==    (thread finished, call stack no longer available)
==1368067== Other segment end (thread 2)
==1368067==    (thread finished, call stack no longer available)
==1368067==

Here is the c++ code I am using to reproduce this:

// threads.cxx
#if 0
#include <filesystem>
namespace fs = std::filesystem;
#else
#include <boost/filesystem/path.hpp>
namespace fs = boost::filesystem;
#endif

#include <pthread.h>

#define N 32

static void* run(void* arg) {
  uintptr_t job = (uintptr_t)arg;
  fs::path path{"/foo/bar.jpg"};
  path.replace_extension("png");
  return nullptr;
}

int main(int argc, char* argv[]) {
  uintptr_t i;
  int rc;
  pthread_t threads[N];
  for (i = 0; i < N; ++i) {
    if ((rc = pthread_create(threads + i, nullptr, run, (void*)i))) {
      return rc;
    }
  }

  for (i = 0; i < N; ++i) {
    if ((rc = pthread_join(threads[i], nullptr))) {
      return rc;
    }
  }
  return 0;
}

I have been reviewing the boost 1.74 codebase but must admit I fail to understand why valgrind would consider this as conflicting load here. Could someone please help me go through the boost::filesystem codebase to check if there is actually a thread safety issue here? I was not able to reproduce the issue using std::filesystem (see #ifdef in the code).

For reference:

  • System: Debian stable
  • Compiler: g++ 12.2.0-14
  • Valgrind: 3.19.0

Update:

% make && valgrind --tool=helgrind ./threads
[100%] Built target threads
==1455067== Helgrind, a thread error detector
==1455067== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==1455067== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==1455067== Command: ./threads
==1455067==
==1455067== ---Thread-Announcement------------------------------------------
==1455067==
==1455067== Thread #3 was created
==1455067==    at 0x4BEEA2F: clone (clone.S:76)
==1455067==    by 0x4BEF878: __clone_internal (clone-internal.c:83)
==1455067==    by 0x4B6ED6F: create_thread (pthread_create.c:295)
==1455067==    by 0x4B6F82C: pthread_create@@GLIBC_2.34 (pthread_create.c:831)
==1455067==    by 0x484B5D7: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==1455067==    by 0x10B2AD: main (in /home/mathieu/Perso/PublicRep/cxx/boost/bin/threads)
==1455067==
==1455067== ---Thread-Announcement------------------------------------------
==1455067==
==1455067== Thread #2 was created
==1455067==    at 0x4BEEA2F: clone (clone.S:76)
==1455067==    by 0x4BEF878: __clone_internal (clone-internal.c:83)
==1455067==    by 0x4B6ED6F: create_thread (pthread_create.c:295)
==1455067==    by 0x4B6F82C: pthread_create@@GLIBC_2.34 (pthread_create.c:831)
==1455067==    by 0x484B5D7: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==1455067==    by 0x10B2AD: main (in /home/mathieu/Perso/PublicRep/cxx/boost/bin/threads)
==1455067==
==1455067== ----------------------------------------------------------------
==1455067==
==1455067== Possible data race during read of size 8 at 0x48AB208 by thread #3
==1455067== Locks held: none
==1455067==    at 0x48A0CB6: size (basic_string.h:1064)
==1455067==    by 0x48A0CB6: boost::filesystem::path::end() const (path.cpp:737)
==1455067==    by 0x48A1331: boost::filesystem::path::compare(boost::filesystem::path const&) const (path.cpp:178)
==1455067==    by 0x48A16D9: operator== (path.hpp:771)
==1455067==    by 0x48A16D9: boost::filesystem::path::extension() const (path.cpp:353)
==1455067==    by 0x48A184E: boost::filesystem::path::replace_extension(boost::filesystem::path const&) (path.cpp:238)
==1455067==    by 0x10B449: run(void*) (in /home/mathieu/Perso/PublicRep/cxx/boost/bin/threads)
==1455067==    by 0x484B7D6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==1455067==    by 0x4B6F133: start_thread (pthread_create.c:442)
==1455067==    by 0x4BEEA3F: clone (clone.S:100)
==1455067==
==1455067== This conflicts with a previous write of size 8 by thread #2
==1455067== Locks held: none
==1455067==    at 0x48A03B0: _M_length (basic_string.h:229)
==1455067==    by 0x48A03B0: _M_set_length (basic_string.h:267)
==1455067==    by 0x48A03B0: _M_construct<char const*> (basic_string.tcc:247)
==1455067==    by 0x48A03B0: basic_string<> (basic_string.h:642)
==1455067==    by 0x48A03B0: path (path.hpp:171)
==1455067==    by 0x48A03B0: boost::filesystem::detail::dot_path() (path.cpp:698)
==1455067==    by 0x48A16CE: boost::filesystem::path::extension() const (path.cpp:353)
==1455067==    by 0x48A184E: boost::filesystem::path::replace_extension(boost::filesystem::path const&) (path.cpp:238)
==1455067==    by 0x10B449: run(void*) (in /home/mathieu/Perso/PublicRep/cxx/boost/bin/threads)
==1455067==    by 0x484B7D6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==1455067==    by 0x4B6F133: start_thread (pthread_create.c:442)
==1455067==    by 0x4BEEA3F: clone (clone.S:100)
==1455067==  Address 0x48ab208 is 8 bytes inside data symbol "_ZZN5boost10filesystem6detail8dot_pathEvE7dot_pth"
==1455067==
==1455067== ----------------------------------------------------------------

Update 2:

Is the following code acceptable for multi-threaded application:

Thread 2 "threads" hit Breakpoint 2, boost::filesystem::detail::dot_path () at libs/filesystem/src/path.cpp:698
(gdb) list
693         const path&  dot_path()
694         {
695     #   ifdef BOOST_WINDOWS_API
696           static const fs::path dot_pth(L".");
697     #   else
698           static const fs::path dot_pth(".");
699     #   endif
700           return dot_pth;
701         }
702

Solution

  • I've reproduced the issue with both DRD/Helgrind when using

    • GCC 10.3.0, Boost 1.74.0, valgrind 3.18.1
    • GCC 11.3.0, Boost 1.74.0, valgrind 3.19.0
    • GCC 12.2.0, Boost 1.74.0, valgrind 3.20.0
    • GCC 12.2.0, Boost 1.75.0, valgrind 3.20.0

    Indeed the offender is with dot_dot_path et al. which use a function-local static. This is guaranteed to be threadsafe by the c++11 standard (and up)¹.

    The problem starts goeing away with e.g.

    • GCC 12.2.0, Boost 1.77.0, valgrind 3.20.0

    In the changes I suspect that the following might be involved:

    tree 7ec0164241fddd9890c68be6f7253b856752117c
    parent f15a0e9b90910c74eb6ee3c15442be518a2e89fc
    author Andrey Semashev <[email protected]> Mon Jun 28 20:58:35 2021 +0300
    committer Andrey Semashev <[email protected]> Mon Jun 28 20:58:35 2021 +0300
    
    Init path globals early to allow using Boost.FS during program termination.
    
    This works around recurringissues when Boost.Filesystem is used during
    program termination (for example, in Boost.Log, when it performs the final
    log file rotation). At that point, the path locale as well as dot and dot-dot
    paths may no longer be available.
    
    Also, MSVC 14.2 has a bug[1] that results in a deadlock whet dot or
    dot-dot path is being created during program termination, while atexit
    callbacks are being run in the main thread. This change works around it
    as the new code does not call atexit on initialization of these paths.
    
    This is only supported on MSVC, GCC, Clang and compatible compilers that
    support MSVC-specific or GCC-specific means to customize global initialization
    order.
    
    [1]: https://github.com/boostorg/log/issues/153
    

    I'll try to check whether a false-positive regarding function-local static was involved in the valgrind versions used.

    ¹ e.g. Is local static variable initialization thread-safe in C++11?