Sorry, the title is a click bait... It's not as easy to solve as you think... that one is a real challenge
I am having a very weird issue where a thread that is joinable() fails to join().
The error I get is No such process.
This is not a typical beginner's mistake of joining threads twice... It is a complex issue and probably even caused by memory corruption... But I am hoping that I am simply missing something and I need a fresh external view... I have been working on this issue for two days.
I am compiling for both Linux and Windows.
On Linux (using gcc 9.1.0) it works flawlessly every time.
On Windows (using x86_64-w64-mingw32-g++ 9.2.0 from my linux machine and running the program on my windows machine) I always get the error.
Here's what I can confirm WITH 100% CERTAINTY :
That very last point may very well be the source of the issue, though I really don't see how.
I also know that the object containing the thread pointer is not destroyed before the join(). The only place where I delete this pointer is right after the join() if successful. The parent object is a wrapped within a shared_ptr.
The pointer to that thread is also never used/shared elsewhere.
The code is very difficult to simplify and share here since it is part of a complete networking system and all aspects of it may be the source of the issue.
Oh, and the actual thread is correctly executed and all resulting network communications work as they should even though the thread cannot be joined.
Here's a very simplified version of the important parts with comments explaining what happens :
// We instantiate a new ListeningServer then call Start(),
// then we connect a client to it, we transfer some data,
// then we call Stop() on the ListeningServer and we get the error, but everything worked flawlessly still
typedef std::function<void(std::shared_ptr<ListeningSocket>)> Func;
class ListeningServer {
ListeningSocket listeningSocket; // The class' Constructor initializes it correctly
void Start(uint16_t port) {
listeningSocket.Bind(port);
listeningSocket.StartListeningThread([this](std::shared_ptr<ListeningSocket> socket) {
HandleNewConnection(socket);
});
}
void HandleNewConnection(std::shared_ptr<ListeningSocket> socket) {
// Whatever we are doing here works flawlessly and does not change the outcome of the error
}
void Stop() {
listeningSocket.Disconnect();
}
};
class ListeningSocket {
SOCKET socket = INVALID_SOCKET; // Native winsock fd handle for windows or typedefed to int on linux
std::thread* listeningThread = nullptr;
std::atomic<bool> listening = false;
void StartListeningThread(Func&& newSocketCallback) {
listening = (::listen(socket, SOMAXCONN) >= 0);
if (!listening) return; // That does not happen, we're still good
listeningThread = new std::thread([this](std::shared_ptr<ListeningSocket>&& newSocketCallback){
while (IsListening()) {
// Here I have Ommited a ::poll call with a 10ms timeout as interval so that the thread does not block, the issue is happening with or without it
memset(&incomingAddr, 0, sizeof(incomingAddr));
SOCKET clientSocket = ::accept(socket, (struct sockaddr*)&incomingAddr, &addrLen);
if (IsListening() && IsValid(clientSocket)) {
newSocketCallback(std::make_shared<ClientSocket>(clientSocket, incomingAddr)); // ClientSocket is a wrapper to native SOCKET with addr info and stuff...
}
}
LOG("ListeningThread Finished") // This is correctly logged just before the error
}, std::forward<Func>(newSocketCallback));
LOG("Listening with Thread " << listeningThread->get_id()) // This is correctly logged to the same thread id that we want to join() after
}
INLINE void Disconnect() {
listening = false; // will make IsListening() return false
if (listeningThread) {
if (listeningThread->joinable()) {
LOG("*** Socket Before join thread " << listeningThread->get_id()) // Logs the correct thread id
try {
listeningThread->join();
delete listeningThread;
listeningThread = nullptr;
LOG("*** Socket After join thread") // NEVER LOGGED
} catch(...) {
LOG("JOIN ERROR") // it ALWAYS goes here with "No Such Process"
SLEEP(100ms) // We need to make sure the thread still finishes in time
// The thread finishes in time and all resulting actions work flawlessly
}
}
}
#ifdef _WINDOWS
::closesocket(socket);
#else
::close(socket);
#endif
socket = INVALID_SOCKET;
}
};
Anothing important thing to note is that elsewhere in the program I am directly instantiating a ListeningSocket and calling StartListeningThread() with a lambda and that one does not fail to join the thread after calling Disconnect() directly
Also, part of this code is compiled in a shared library that is linked dynamically.
Issue solved !
It would seem that, in windows only, one cannot create a thread from code compiled in a shared library and try to join it from code compiled in the main application.
Basically, the joinable() will return true, but the .join() or .detach() will fail.
All I had to do is to make sure the thread is created and joined from code originally compiled in the same file.
It was the kind of hint that I was looking for when I asked the question, because I knew that it was more complicated than that and a simplified minimal code would not be able to reproduce the issue.
This constraint of threads in windows is not documented anywhere (as I know of, and I SEARCHED) So it is very plausible that it's not supposed to be a constraint and is actually a bug in the compiler I am using.