Search code examples
c++visual-c++stllibstdc++stl-algorithm

Using std::lower_bound() on large arrays


The code below returns erroneous results if compiled for 32 bit Linux systems, and the same problem applies to 64 bit systems, given large enough vectors.

Have the preconditions of lower_bound or STL in general been violated, and if so, where?

I have been informed by STL sources that the size of the vector is cast to a signed type, which explains the behaviour.

// compile with and without -m32 switch
#include<algorithm>
#include<iostream>
#include<stdexcept>
#include<vector>
using namespace std;
int main() {
 try {
  vector<uint8_t> v((1ULL << 28) * 9, 2); // 2.25 G entries
  v.back() = 3;                           // the last of which is greater
  cout<< "Vector maximal size: "<<v.max_size()<< " and actual size: " << v.size() <<endl;
  uint8_t val=3;
  auto x= lower_bound(v.begin(), v.end(), val );
  if (x!=v.end() && !( val< *x ) ) {
   cout << "Found value " << int(*x) << endl;
  } else {
   cout << "Not Found " << endl;
  }
 } catch (exception const & ex){
  cerr<< ex.what()<<endl;
 }
}

Output: (Linux OS & Clang++ 7.0.0)

Vector maximal size: 4294967295 and actual size: 2415919104
Found value 2

Output: (Windows 10 OS & 32bit-msvc)

vector<T> too long

Update: While a fix for std::vector is under way, the problem persists for arrays allocated by

auto p= new uint8_t[sz]; // 2.25 G entries 

and the result depends on compiler & stdlib.


Solution

  • In libstdc++ the function lower_bound(...) uses distance(...), it starts with:

    typedef typename iterator_traits<_ForwardIterator>::difference_type  _DistanceType;
    _DistanceType __len = std::distance(__first, __last);
    ...
    

    According to the Standard (23.2, [container.requirements]):

    Expression: a.max_size(); return type: size_type; operational semantics: distance(begin(), end()) for the largest possible container

    distance(...) returns difference_type (24.4.4, [iterator.operations]]

    template<class InputIterator>
    typename iterator_traits<InputIterator>::difference_type
    distance(InputIterator first, InputIterator last);
    

    Hence, max_size() should return a value than can be represented using a signed type (int32_t in the present case). However, max_size() returns 4'294'967'295. I guess this is a bug in libstdc++.

    By the way, in Microsoft STL implementation max_size() returns 2'147'483'647.