Search code examples
c++linuxubuntu-16.04boost-spiritcoredump

Why does boost split cause double free or corruption issue


I developed a web server with C++ and here is a function, which caused a coredump issue but I don't know why.

bool MyClass::hasFamilyAdminPermission(uint32_t uid) {
    ReadMutex mutex(&m_mtx); // this is read lock to lock m_familyOwner and m_familyAdmins
    if (uid == m_familyOwner) {
        return true;
    }
    std::vector<std::string> fields;
    boost::split(fields, m_familyAdmins, boost::is_any_of(","));
    std::string uidStr = boost::lexical_cast<string>(uid);
    for (std::vector<std::string>::iterator itor = fields.begin(); itor != fields.end(); ++itor) {
        if (uidStr == *itor) {
            return true;
        }
    }
    return false;
}

After executing gdb ./myServer coredump_file, I got the output as below:

warning: Unexpected size of section `.reg-xstate/8717' in core file.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./myServer'.
Program terminated with signal SIGABRT, Aborted.

warning: Unexpected size of section `.reg-xstate/8717' in core file.
#0  0x00007f627b2c5428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f62277fe700 (LWP 8717))]
(gdb) where
#0  0x00007f627b2c5428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f627b2c702a in __GI_abort () at abort.c:89
#2  0x00007f627b3077ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f627b420ed8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f627b31037a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>,
    str=0x7f627b420fa0 "double free or corruption (fasttop)", action=3) at malloc.c:5006
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867
#5  0x00007f627b31453c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#6  0x00007f627be650b4 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x000000000046acc1 in boost::as_literal<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > (r=...)
    at /usr/local/include/boost/range/as_literal.hpp:102
#8  boost::algorithm::iter_split<std::vector<std::string, std::allocator<std::string> >, std::string, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > > (Result=..., Input="604679400,1430691907,2792989999",
    Finder=<error reading variable: DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at /usr/local/include/boost/algorithm/string/iter_find.hpp:153
#9  0x0000000000464c17 in boost::algorithm::split<std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::string, boost::algorithm::detail::is_any_ofF<char> > (eCompress=<optimized out>, Pred=..., Input=..., Result=...) at /usr/local/include/boost/algorithm/string/split.hpp:149
#10 server::myServer::MyClass::hasFamilyAdminPermission (this=0x7f621c066dd0, uid=<optimized out>) at MyClass.cpp:202
#11 0x0000000000435678 in server::myServer::ServerAPI::onSetMicDataReq (this=0x7fff6da7e750, uid=1430691907, req=0x7f621c094200)
    at ServerAPI.cpp:2291

So it seems that boost::split caused the coredump.

In gdb, I've done as below:

(gdb) frame 10
#10 server::myServer::MyClass::hasFamilyAdminPermission (this=0x7f621c066dd0, uid=<optimized out>) at MyClass.cpp:202
202     MyClass.cpp: No such file or directory.
(gdb) print fields
$1 = std::vector of length 0, capacity 0

For now, fields is an empty vector.

(gdb) frame 8
#8  boost::algorithm::iter_split<std::vector<std::string, std::allocator<std::string> >, std::string, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > > (Result=..., Input="604679400,1430691907,2792989999",
    Finder=<error reading variable: DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at /usr/local/include/boost/algorithm/string/iter_find.hpp:153
153     /usr/local/include/boost/algorithm/string/iter_find.hpp: No such file or directory.

Here, the input is "604679400,1430691907,2792989999", which looks like ok.

However, when I execute info loclas (I'm still at the frame 8), I got a segmentation fault.

(gdb) info locals
Tmp = std::vector of length 1891892, capacity 1891893 = {<error reading variable Tmp (Cannot access memory at address 0x0)>
itBegin = {<boost::iterator_adaptor<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::use_default, boost::use_default>, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::use_default, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::use_default>> = {<boost::iterator_facade<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::use_default, boost::use_default>, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::forward_traversal_tag, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, long>> = {<No data fields>},
    m_iterator = {<boost::iterator_facade<boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const, boost::forward_traversal_tag, boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, long>> = {<No data fields>}, <boost::algorithm::detail::find_iterator_base<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
        m_Finder = {<boost::function_base> = {vtable = 0x89bd9dd2, functor = {obj_ptr = 0x7f627b31453c <__GI___libc_free+76>, type = {
                type = 0x7f627b31453c <__GI___libc_free+76>, const_qualified = false, volatile_qualified = false},
              func_ptr = 0x7f627b31453c <__GI___libc_free+76>, bound_memfunc_ptr = {
                memfunc_ptr = (void (boost::detail::function::X::*)(boost::detail::function::X * const,
    int)) 0x7f627b31453c <__GI___libc_free+76>, obj_ptr = 0xe8436c057b9e1300}, obj_ref = {obj_ptr = 0x7f627b31453c <__GI___libc_free+76>,
                is_const_qualified = false, is_volatile_qualified = false},
              data = 60 '<'}}, <std::binary_function<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >> = {<No data fields>},
/usr/bin/sudo: line 11: 75240 Segmentation fault      (core dumped) ${SUDO} $@

I don't know what caused the coredump, I don't know either why I got the segmentation fault after executing info locals...


Solution

  • So, boost::split doesn't crash. You have undefined behavior elsewhere.

    Regardless, why are you parsing through a string, allocating a vector of strings, comparing to a temporary string etc. all the time? You could do this on the integer-domain.

    Four takes. Starting from a simple skeleton:

    #include <shared_mutex>
    #include <string>
    
    struct MyClass1 {
        MyClass1(uint32_t owner, std::string admins)
            : m_familyOwner(owner)
            , m_familyAdmins(std::move(admins)) {}
    
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
      private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
        std::string               m_familyAdmins;
    };
    

    1. Comparing Ints, No Allocations

    I'll use Boost Spirit X3:

    #include <boost/spirit/home/x3.hpp>
    bool MyClass1::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        if (uid == m_familyOwner)
            return true;
    
        bool matched = false;
        auto element = boost::spirit::x3::uint32;
        auto check   = [uid, &matched](auto& ctx) {
            if (_attr(ctx) == uid) {
                matched    = true;
                _pass(ctx) = false; // short circuit for perf
            }
        };
    
        parse(begin(m_familyAdmins), end(m_familyAdmins), element[check] % ',');
        return matched;
    }
    

    This still does quite a lot of work under the lock, but certainly never allocates. Also, it does early-out, which helps if the collection of owners can be very large.

    2. Comparing Text, But Without Allocations

    With a nifty regex you can match the number as text on a constant string (or string view). The overhead here is the allocation(s) for the regex. But arguably, it's much simpler:

    #include <regex>
    bool MyClass2::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        if (uid == m_familyOwner)
            return true;
    
        return regex_search(m_familyAdmins, std::regex("(^|,)" + std::to_string(uid) + "(,|$)"));
    }
    

    3. Parse Once, At Construction

    Why are we dealing with text? We could keep the admins in a set:

    #include <set>
    struct MyClass3 {
        MyClass3(uint32_t owner, std::string_view admins) : m_familyOwner(owner) {
            parse(admins.begin(), end(admins), boost::spirit::x3::uint32 % ',', m_familyAdmins);
        }
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
      private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
        std::set<uint32_t>        m_familyAdmins;
    };
    
    bool MyClass3::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        return uid == m_familyOwner || m_familyAdmins.contains(uid);
    }
    

    That's even simpler. However, there's some overhead in the set which can be optimized.

    4. Parse Once, No Allocations, Speed

    std::set has the right semantics. However, for small sets it's sad that there's no locality of reference, and relatively high node allocation overhead. We could replace with:

    boost::container::flat_set< //
        uint32_t,               //
        std::less<>,            //
        boost::container::small_vector<uint32_t, 10>>
        m_familyAdmins;
    

    This makes it so that sets <= 10 elements do not allocate at all, and lookup benefits from contiguous storage. However, at this rate - unless you want to deal with duplicate entries - you might keep a linear search and store:

    boost::container::small_vector<uint32_t, 10>
        m_familyAdmins;
    

    Combined Demo

    Showing all the subtle edge cases. Note that only with the X3 parser

    • it will be easy to perform input validation on the comma-separated string
    • it will be easy to reliably compare differently formatted uid numbers

    I snuck in one number that has a leading 0 (089 instead of 89) just to highlight this issue with the std::regex approach. Note that your original code has the same problem.

    Live On Coliru/Compiler Explorer

    #include <shared_mutex>
    #include <string>
    
    struct MyClass1 {
        MyClass1(uint32_t owner, std::string admins)
            : m_familyOwner(owner)
            , m_familyAdmins(std::move(admins)) {}
    
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
        private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
        std::string               m_familyAdmins;
    };
    
    #include <boost/spirit/home/x3.hpp>
    bool MyClass1::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        if (uid == m_familyOwner)
            return true;
    
        bool matched = false;
        auto element = boost::spirit::x3::uint32;
        auto check   = [uid, &matched](auto& ctx) {
            if (_attr(ctx) == uid) {
                matched    = true;
                _pass(ctx) = false; // short circuit for perf
            }
        };
    
        parse(begin(m_familyAdmins), end(m_familyAdmins), element[check] % ',');
        return matched;
    }
    
    struct MyClass2 {
        MyClass2(uint32_t owner, std::string admins)
            : m_familyOwner(owner)
            , m_familyAdmins(std::move(admins)) {}
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
        private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
        std::string               m_familyAdmins;
    };
    
    #include <regex>
    bool MyClass2::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        if (uid == m_familyOwner)
            return true;
    
        return std::regex_search(m_familyAdmins, std::regex("(^|,)" + std::to_string(uid) + "(,|$)"));
    }
    
    #include <set>
    struct MyClass3 {
        MyClass3(uint32_t owner, std::string_view admins) : m_familyOwner(owner) {
            parse(admins.begin(), end(admins), boost::spirit::x3::uint32 % ',', m_familyAdmins);
        }
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
        private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
        std::set<uint32_t>        m_familyAdmins;
    };
    
    bool MyClass3::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        return uid == m_familyOwner || m_familyAdmins.contains(uid);
    }
    
    #include <boost/container/flat_set.hpp>
    #include <boost/container/small_vector.hpp>
    struct MyClass4 {
        MyClass4(uint32_t owner, std::string_view admins) : m_familyOwner(owner) {
            parse(admins.begin(), end(admins), boost::spirit::x3::uint32 % ',', m_familyAdmins);
        }
        bool hasFamilyAdminPermission(uint32_t uid) const;
    
        private:
        mutable std::shared_mutex m_mtx; // guards m_familyOwner and m_familyAdmins
        uint32_t                  m_familyOwner;
    #ifdef LINEAR_SEARCH
        // likely faster with small sets, anyways
        boost::container::small_vector<uint32_t, 10> m_familyAdmins;
    #else
        boost::container::flat_set< //
            uint32_t,               //
            std::less<>,            //
            boost::container::small_vector<uint32_t, 10>>
                m_familyAdmins;
    #endif
    };
    
    bool MyClass4::hasFamilyAdminPermission(uint32_t uid) const {
        std::shared_lock mutex(m_mtx);
        return uid == m_familyOwner ||
    #ifndef LINEAR_SEARCH
            std::find(begin(m_familyAdmins), end(m_familyAdmins), uid) != end(m_familyAdmins);
    #else
        m_familyAdmins.contains(uid);
    #endif
    }
    
    #include <iostream>
    int main() {
        MyClass1 const mc1{42, "21,377,34,233,55,089,144"};
        MyClass2 const mc2{42, "21,377,34,233,55,089,144"};
        MyClass3 const mc3{42, "21,377,34,233,55,089,144"};
        MyClass4 const mc4{42, "21,377,34,233,55,089,144"};
    
        std::cout << "uid\tdynamic\tregex\tset\tflat_set\n"
            << "\t(x3)\t-\t(x3)\t(x3)\n"
            << std::string(5 * 8, '-') << "\n";
    
        auto compare = [&](uint32_t uid) {
            std::cout << uid << "\t" << std::boolalpha
                << mc1.hasFamilyAdminPermission(uid) << "\t"
                << mc2.hasFamilyAdminPermission(uid) << "\t"
                << mc3.hasFamilyAdminPermission(uid) << "\t"
                << mc4.hasFamilyAdminPermission(uid) << "\n";
        };
    
        compare(42);
        // https://en.wikipedia.org/wiki/Fibonacci_number
        for (auto i = 3, j = 5; i < 800; std::tie(i, j) = std::tuple{j, i + j}) {
            compare(i);
        }
    }
    

    Prints

    id      dynamic regex   set     flat_set
            (x3)    -       (x3)    (x3)
    ----------------------------------------
    42      true    true    true    true
    3       false   false   false   false
    5       false   false   false   false
    8       false   false   false   false
    13      false   false   false   false
    21      true    true    true    true
    34      true    true    true    true
    55      true    true    true    true
    89      true    false   true    true
    144     true    true    true    true
    233     true    true    true    true
    377     true    true    true    true
    610     false   false   false   false