Search code examples
c++boost-threadcoredump

C++ Boost::thread - kernel: traps general protection


My code has worked using an old version of boost 1.49, but it was many years ago. Now i'm using boost 1.67

Edit: My project includes a server/client feature using the same binary. one the server is started, i can send commands who are received to launch custom process. This is the code showed below.

I identified the line causing the kernel trap:

boost::thread th(Temporal::Acquire, transmit, ECONF);
  • The thread is starting and function in argument is called, but the initiated thread crash instantly. I don't understand the "general protection".

  • I tried to find more answer from try catch (std::exception &e) but it seem's requiring another catcher between... nothing to output.

  • Tried to understand the handling for tls_destructor inside libs/thread/src/pthread/thread.cpp, but since i've tested my code by replacing all std to boost only without solve the issue...

  • Valgrind show no errors at all.

There is a way to understand the direct termination (without calling join or interrupt) ?

Part to initiate the server socket (from another file), using standard thread, but i don't think that is the source of issue: Since i started my project, i never got conflict mixing std / boost.

coex.push_back(std::thread(Temporal::Listener, ECONF));

Server part:

#include "lobe.hpp"

#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>

void    Temporal::Acquire(std::string transmit, Json::Value ECONF)
{
    syslog(LOG_NOTICE, "aquired");
    // Temporal::Transcode p(transmit, ECONF);
}

void    Temporal::Listener(Json::Value ECONF)
{
    socklen_t           t;
    std::string         transmit(100, 0);
    int                 PIPE_local, PIPE_remote, len;
    struct sockaddr_un  local, remote;

    int reuseaddr = 1;
    memset(&local, 0, sizeof(local));

    if((PIPE_local = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
        perror("socket");

    if(setsockopt(PIPE_local, SOL_SOCKET, SO_REUSEADDR, &reuseaddr, sizeof(reuseaddr)) == -1)
        perror(strerror(errno));

    local.sun_family = AF_UNIX;
    strncpy(local.sun_path, P_SOCK, sizeof(local.sun_path)-1);
    unlink(P_SOCK);
    len = strlen(local.sun_path) + sizeof(local.sun_family);


    if(bind(PIPE_local, (struct sockaddr *)&local, len) == -1)
        perror("bind");

    if(listen(PIPE_local, 5) == -1)
        perror("listen");

    for(;;)
    {
        syslog(LOG_INFO, "inside SOCK");
        int done, com_Listen, com_Talk;
        t = sizeof(remote);
        if((PIPE_remote = accept(PIPE_local, (struct sockaddr *)&remote, &t)) == -1)
            perror("accept");

        done = 0;
        do
        {
            com_Listen = read(PIPE_remote, &transmit[0], 99);
            if(com_Listen <= 0)
            {
                syslog(LOG_NOTICE, "<<-== %s", transmit.c_str());
                if(com_Listen < 0) perror("recv");
                done = 1;

                syslog(LOG_NOTICE, "received");
                boost::thread th(Temporal::Acquire, transmit, ECONF);
            }
        }while(!done);

        close(PIPE_remote);
        break;
    }
    close(PIPE_local);
    unlink(P_SOCK);

    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
    //boost::this_thread::sleep_for(boost::chrono::seconds(1));
    Temporal::Listener(ECONF);
}

Client part:

The output of systemd coredump:

Jun 17 22:37:25 bytewild kernel: traps: EIE[8033] general protection ip:44f59c sp:7fd32bffecb0 error:0 in EIE[400000+233000]
Jun 17 22:37:25 bytewild EIE[7699]: aquired
Jun 17 22:37:25 bytewild systemd[1]: Started Process Core Dump (PID 8034/UID 0).
-- Subject: Unit [email protected] has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- Unit [email protected] has finished starting up.
-- 
-- The start-up result is RESULT.
Jun 17 22:37:25 bytewild systemd-coredump[8041]: Failed to get ACL: Operation not supported
Jun 17 22:37:26 bytewild systemd-coredump[8041]: Process 7699 (EIE) of user 1000 dumped core.

     Stack trace of thread 8033:
     #0  0x000000000044f59c tls_destructor (/data/dev/in/native/projects/eie/build/bin/EIE)
     #1  0x000000000045092a thread_proxy (/data/dev/in/native/projects/eie/build/bin/EIE)
     #2  0x0000000000500155 start_thread (/data/dev/in/native/projects/eie/build/bin/EIE)
     #3  0x00000000005707ff __clone (/data/dev/in/native/projects/eie/build/bin/EIE)

     Stack trace of thread 7700:
     #0  0x0000000000503623 __pthread_cond_timedwait (/data/dev/in/native/projects/eie/build/bin/EIE)
     #1  0x000000000041214b _ZN5boost18condition_variable13do_wait_untilERNS_11unique_lockINS_5mutexEEERKNS_6detail23mono_platform_timepointE (/data/dev/in/native/projects/eie/build/bin/EIE)
     #2  0x000000000040ebe4 _ZN8Temporal8ListenerEN4Json5ValueE (/data/dev/in/native/projects/eie/build/bin/EIE)
     #3  0x000000000041f58e _ZSt13__invoke_implIvPFvN4Json5ValueEEJS1_EET_St14__invoke_otherOT0_DpOT1_ (/data/dev/in/native/projects/eie/build/bin/EIE)
     #4  0x00000000004eb0ef execute_native_thread_routine (/data/dev/in/native/projects/eie/build/bin/EIE)
     #5  0x0000000000500155 start_thread (/data/dev/in/native/projects/eie/build/bin/EIE)
     #6  0x00000000005707ff __clone (/data/dev/in/native/projects/eie/build/bin/EIE)

     Stack trace of thread 7699:
     #0  0x0000000000504a21 __nanosleep (/data/dev/in/native/projects/eie/build/bin/EIE)
     #1  0x000000000056bfea __sleep (/data/dev/in/native/projects/eie/build/bin/EIE)
     #2  0x0000000000406e91 main (/data/dev/in/native/projects/eie/build/bin/EIE)
     #3  0x0000000000506dfa __libc_start_main (/data/dev/in/native/projects/eie/build/bin/EIE)
     #4  0x000000000040790a _start (/data/dev/in/native/projects/eie/build/bin/EIE)

Here is a schematic preview of my system to understand the problem:

Sorry if this is obvious, but i'm stumped. Any clues ?


Solution

  • Looking at your code I believe the issue is in this block of code.

    do
            {
                com_Listen = read(PIPE_remote, &transmit[0], 99);
                if(com_Listen <= 0)
                {
                    syslog(LOG_NOTICE, "<<-== %s", transmit.c_str());
                    if(com_Listen < 0) perror("recv");
                    done = 1;
    
                    syslog(LOG_NOTICE, "received");
                    boost::thread th(Temporal::Acquire, transmit, ECONF);
                }
            }while(!done);
    

    Specifically, the way you create your boost::thread. This is a stack-based variable. As soon as you start your thread its object is destroyed and the DTor is called.

    I do not have much experience with boost implementation of thread, but have used std::thread which was largely modeled on boost implementation. Looking at their class documentation there is an effects section which list the impact of destroying a running thread.

    Effects: - if defined BOOST_THREAD_DONT_PROVIDE_THREAD_DESTRUCTOR_CALLS_TERMINATE_IF_JOINABLE: If the thread is joinable calls detach(), DEPRECATED - if defined BOOST_THREAD_PROVIDES_THREAD_DESTRUCTOR_CALLS_TERMINATE_IF_JOINABLE: If the thread is joinable calls to std::terminate. Destroys *this.

    It looks like the default is to call terminate() like std::thread does for joinable threads that are destroyed. It looks like the old behavior was to auto detatch those.

    Edit Read the destructor documentation for current vs. 1.49.

    1.49 Effects: If *this has an associated thread of execution, calls detach(). Destroys *this.

    Now, go read the same code for current(listed above). It states it now defaults to terminate

    Edit2

    My suggestion would be to no longer spin up a new thread everytime you receive input from your socket. Instead, I suggest creating a worker queue that has a constant number of background threads. Each time you receive a new event simply add the work object to the queue and one of the background threads will handle the response.

    #include <chrono>
    #include <condition_variable>
    #include <deque>
    #include <iostream>
    #include <mutex>
    #include <thread>
    #include <vector>
    
    // Synchrnoized Output
    std::mutex stdm;
    template <typename T>
    void Log(T const& t) {
        std::lock_guard<std::mutex> lk(stdm);
        std::cout << t << std::endl;
    }
    
    // Task Queue
    // Has user provided list of task, which are handled on background threads
    template <class T, int N = 4>
    class TaskQueue {
        std::deque<T> work;                 // Holds work
        std::vector<std::thread> threads;   // Holds worker threads
        std::mutex m;                       // Holds lock for work container/running
        std::condition_variable cv;         // Worker threads wait on this
        bool running;                       // Inform the task queue if its running
    
        public:
        // Constructor, spins up worker threads and waits for work
        TaskQueue() : running{true} {
            threads.reserve(N);
            for (int i = 0; i < N; ++i) {
                // Build worker threads
                threads.emplace_back([&]() {
                    // Normal running before queue destruction
                    while (running) {
                        {
                            std::unique_lock<std::mutex> lk(
                                m);
                            cv.wait(lk, [&] {
                                return !running ||
                                       work.size() != 0;
                            });
    
                            // Extract work && Update work
                            // Queue
                            T t = std::move(work.front());
                            work.pop_front();
    
                            // Release lock before
                            // performing work
                            lk.unlock();
    
                            // Peform work
                            t();
                        }
                    }
    
                    // Empty Queue on destrucrtion
                    bool hasMoreWork = true;
                    do {
                        std::unique_lock<std::mutex> lk(m);
                        if ((hasMoreWork = work.size() > 0)) {
                            // Manage Work
                            T t = std::move(work.front());
                            work.pop_front();
    
                            // has more?
                            hasMoreWork = work.size() > 0;
    
                            // release lock
                            lk.unlock();
    
                            // perform work
                            t();
                        }
                    } while (hasMoreWork);
                });
            }
        }
    
        ~TaskQueue() {
            // Inform queue its closing
            {
                std::lock_guard<std::mutex> lk(m);
                running = false;
            }
    
            // Inform all threads of change
            cv.notify_all();
    
            // Clear out remainings objects
            int workObjects = 0;
            bool queueCleared = false;
            do {
                {
                    std::lock_guard<std::mutex> lg(m);
                    queueCleared = (workObjects = work.size()) == 0;
                    Log("Queue Has Remaining: " +
                        std::to_string(workObjects));
                }
    
                // Give worker threads time to work
                std::this_thread::sleep_for(
                    std::chrono::milliseconds(250));
            } while (!queueCleared);
    
            // If any threads are still processing, join them to the current
            // thread or else terminate() is called
            for (int i = 0; i < N; i++) {
                if (threads[i].joinable()) threads[i].join();
            }
        }
    
        template <class... Args>
        void emplace_back(Args&&... args) {
            {
                std::lock_guard<std::mutex> lk(m);
                if (running)
                    work.emplace_back(std::forward<Args>(args)...);
            }
            cv.notify_one();
        }
    };
    
    // The actual task that performs work
    struct Task {
        std::string transmit;
        std::string ECONF;
    
        Task() : transmit(""), ECONF("") {}
        Task(std::string&& t, std::string&& e) : transmit(t), ECONF(e) {}
    
        void operator()() {
            std::thread::id tid = std::this_thread::get_id();
            std::hash<std::thread::id> hasher;
            Log(transmit + ':' + ECONF +
                " Process Time: 500ms One Thread: " +
                std::to_string(hasher(tid)));
    
            // Fake work to consume thread
            std::this_thread::sleep_for(std::chrono::milliseconds(500));
        }
    };
    
    int main() {
        TaskQueue<Task> tp;
        for (int i = 0; i < 200; ++i) {
            // Add work
            tp.emplace_back("Transmit: " + std::to_string(i),
                    "ECONF: " + std::to_string(i));
    
            // Simulate waiting for the next event from socket
            std::this_thread::sleep_for(std::chrono::milliseconds(25));
        }
        return 0;
    }
    

    There is a good number of improvments that can be made, but this should give you a rough overview. Hope this helps.