Search code examples
c++stringlimitsystem-dependent

Maximum length of a std::basic_string<_CharT> string


I was wondering how one can fix an upper limit for the length of a string (in C++) for a given platform.

I scrutinized a lot of libraries, and most of them define it arbitrarily. The GNU C++ STL (the one with experimental C++0x features) has quite a definition:

size_t npos = size_t(-1); /*!< The maximum value that can be stored in a variable of type size_t */
size_t _S_max_len = ((npos - sizeof(_Rep_base))/sizeof(_CharT) - 1) / 4; /*!< Where _CharT is a template parameter; _Rep_base is a structure which encapsulates the allocated memory */

Here's how I understand the formula:

  • The size_t type must hold the count of units allocated to the string (where each unit is of type _CharT)
  • Theoretically, the maximum value that a variable of type size_t can take on is the total number of units of 1 byte (ie, of type char) that may be allocated
  • The previous value minus the overhead required to keep track of the allocated memory (_Rep_base) is therefore the maximum number of units in a string. Divide this value by sizeof(_CharT) as _CharT may require more than a byte
  • Subtract 1 from the previous value to account for a terminating character
  • Finally, that leave the division by 4. I have absolutely no idea why!

I looked at a lot of places for an explanation, but couldn't find a satisfactory one anywhere (that's why I've been trying to make up something for it! Please correct me if I'm wrong!!).


Solution

  • The comments in basic_string.h from GCC 4.3.4 state:

        // The maximum number of individual char_type elements of an
        // individual string is determined by _S_max_size. This is the
        // value that will be returned by max_size().  (Whereas npos
        // is the maximum number of bytes the allocator can allocate.)
        // If one was to divvy up the theoretical largest size string,
        // with a terminating character and m _CharT elements, it'd
        // look like this:
        // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
        // Solving for m:
        // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
        // In addition, this implementation quarters this amount.
    

    In particular, note the last line, "In addition, this implementation quarters this amount." I take that to mean that the division by four is in fact entirely arbitrary.

    I tried to find more information in the checkin log for basic_string.h, but it only goes back to October 5, 2000, and this comment was already present as shown in that revision, and I'm not familiar enough with that code base to know where the file might have lived in the source tree before it was moved to its current location.