Search code examples
c++valgrind

valgrind: Multiple std::vector::resize calls


I have been tracking down an off-by-one issue in a large C++ codebase. For some reason, I cannot understand the following Valgrind behavior. Could someone please shed some light here?

Code is:

% cat foo.cxx
#include <cstring>
#include <string>
#include <vector>

int main() {
  std::vector<char> v;
#ifdef RESIZE9
  v.resize(9);
#endif
  v.resize(10);

  std::string s(10, 'x');
  std::strcpy(&v[0], s.c_str());

  return 0;
}

Here is the expected Valgrind output:

% g++ foo.cxx && valgrind ./a.out
==21886== Memcheck, a memory error detector
==21886== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21886== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==21886== Command: ./a.out
==21886==
==21886== Invalid write of size 1
==21886==    at 0x4838DD7: strcpy (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==21886==    by 0x1092CA: main (in /tmp/a.out)
==21886==  Address 0x4d84c8a is 0 bytes after a block of size 10 alloc'd
==21886==    at 0x4835DEF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==21886==    by 0x109BCD: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (in /tmp/a.out)
==21886==    by 0x109AD5: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (in /tmp/a.out)
==21886==    by 0x109997: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (in /tmp/a.out)
==21886==    by 0x1095E8: std::vector<char, std::allocator<char> >::_M_default_append(unsigned long) (in /tmp/a.out)
==21886==    by 0x1093CC: std::vector<char, std::allocator<char> >::resize(unsigned long) (in /tmp/a.out)
==21886==    by 0x10926A: main (in /tmp/a.out)
==21886==
==21886==
==21886== HEAP SUMMARY:
==21886==     in use at exit: 0 bytes in 0 blocks
==21886==   total heap usage: 2 allocs, 2 frees, 72,714 bytes allocated
==21886==
==21886== All heap blocks were freed -- no leaks are possible
==21886==
==21886== For counts of detected and suppressed errors, rerun with: -v
==21886== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

But now consider the following:

% g++ -DRESIZE9 foo.cxx && valgrind ./a.out
==21904== Memcheck, a memory error detector
==21904== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21904== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==21904== Command: ./a.out
==21904==
==21904==
==21904== HEAP SUMMARY:
==21904==     in use at exit: 0 bytes in 0 blocks
==21904==   total heap usage: 3 allocs, 3 frees, 72,731 bytes allocated
==21904==
==21904== All heap blocks were freed -- no leaks are possible
==21904==
==21904== For counts of detected and suppressed errors, rerun with: -v
==21904== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

System is Debian/10.9 with:

% g++ --version
g++ (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

% valgrind --version
valgrind-3.14.0

Solution

  • While I can't confirm this as a complete (cross-platform) 'solution', adding a line to show the actual capacity of the vector after the resize operation(s) may shed some light:

    #include <cstring>
    #include <string>
    #include <vector>
    #include <iostream>
    
    //#define RESIZE9 1
    
    int main()
    {
        std::vector<char> v;
        #ifdef RESIZE9
        v.resize(9);
        #endif
        v.resize(10);
    
        std::cout << v.capacity() << std::endl; // Show the actual allocated size
    
        std::string s(10, 'x');
        std::strcpy(&v[0], s.c_str());
    
        return 0;
    }
    

    Running this code as is (Visual Studio, MSVC, Windows 10, 64-bit) shows a capacity of 10 (not unexpected). However, when the #define RESIZE9 1 line is uncommented, the shown capacity (after two resize calls) is 13.

    Adding extra capacity is, I believe, within the requirements of the standard: so long as the newly-allocated vector has sufficient capacity for the new size, nothing is broken. The allocation of 4 extra bytes (rather than just one) most likely optimizes memory management.