I have discovered a disturbing inconsistency between std::string
and string literals in C++0x:
#include <iostream>
#include <string>
int main()
{
int i = 0;
for (auto e : "hello")
++i;
std::cout << "Number of elements: " << i << '\n';
i = 0;
for (auto e : std::string("hello"))
++i;
std::cout << "Number of elements: " << i << '\n';
return 0;
}
The output is:
Number of elements: 6
Number of elements: 5
I understand the mechanics of why this is happening: the string literal is really an array of characters that includes the null character, and when the range-based for loop calls std::end()
on the character array, it gets a pointer past the end of the array; since the null character is part of the array, it thus gets a pointer past the null character.
However, I think this is very undesirable: surely std::string
and string literals should behave the same when it comes to properties as basic as their length?
Is there a way to resolve this inconsistency? For example, can std::begin()
and std::end()
be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?
EDIT: To justify my indignation a bit more to those who have said that I'm just suffering the consequences of using C-style strings which are a "legacy feature", consider code like the following:
template <typename Range>
void f(Range&& r)
{
for (auto e : r)
{
...
}
}
Would you expect f("hello")
and f(std::string("hello"))
to do something different?
The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:
std::string operator""s(const char* p, size_t n)
{
return string(p, n);
}
We'll be able to write:
int i = 0;
for (auto e : "hello"s)
++i;
std::cout << "Number of elements: " << i << '\n';
Which now outputs the expected number:
Number of elements: 5
With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.