Search code examples
c++arrayslanguage-lawyerstandardssize-t

Why are C++ array index values signed and not built around the size_t type (or am I wrong in that)?


It's getting harder and harder for me to keep track of the ever-evolving C++ standard but one thing that seems clear to me now is that array index values are meant to be integers (not long long or size_t or some other seemingly more appropriate choice for a size). I've surmised this both from the answer to this question (Type of array index in C++) and also from practices used by well established C++ libraries (like Qt) which also use a simple integer for sizes and array index operators. The nail in the coffin for me is that I am now getting a plethora of compiler warnings from MSVC 2017 stating that my const unsigned long long (aka const size_t) variables are being implicitly converted to type const int when used as an array index.

The answer given by Mat in the question linked above quotes the ISO C++ standard draft n3290 as saying

it shall be an integral constant expression and its value shall be greater than zero.

I have no background in reading these specs and precisely interpreting their language, so maybe a few points of clarification:

  • Does an "integral constant expression" specifically forbid things like long long which to me is an integral type, just a larger sized one?
  • Does what they're saying specifically forbid a type that is tagged unsigned like size_t?

If all I am seeing here is true, an array index values are meant to be signed int types, why? This seems counter-intuitive to me. The specs even state that the expression "shall be greater than zero" so we're wasting a bit if it is signed. Sure, we still might want to compare the index with 0 in some way and this is dangerous with unsigned types, but there should be cheaper ways to solve that problem that only waste a single value, not an entire bit.

Also, with registers ever widening, a more future-proof solution would be to allow larger types for the index (like long long) rather than sticking with int which is a problematic type historically anyways (changing its size when processors changed to 32 bits and then not when they went to 64 bits). I even see some people talking about size_t anecdotally like it was designed to be a more future-proof type for use with sizes (and not JUST the type returned in service of the sizeof operator). But of course, that might be apocryphal.

I just want to make sure my foundational programming understanding here is not flawed. When I see experts like the ISO C++ group doing something, or the engineers of Qt, I give them the benefit of the doubt that they have a good reason! For something like an array index, so fundamental to programming, I feel like I need to know what that reason is or I might be missing something important.


Solution

  • Looking at [expr.sub]/1 we have

    A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.67 The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise. The expression E1 is sequenced before the expression E2.

    emphasis mine

    So, the index of the subscript operator need to be a unscoped enumeration or integral type. Looking in [basic.fundamental] we see that standard integer types are signed char, short int, int, long int, and long long int, and their unsigned counterparts.

    So any of the standard integer types will work and any other integer type, like size_t, will be valid types to use as an array index. The supplied value to the subscript operator can even have a negative value, so long as that value would access a valid element.