I want to implement of sort of Symbol
the same way ruby does.
For this, I created a user defined literal which returned a std::hash
of the std::basic_string<T>
corresponding.
The code was great, but as I read somewhere the hash function may not be consistent over several executions of the same program. Moreover, I wanted to make this computation at compile-time, which was 1) not supported by std::hash
and 2) would break code if std::hash
return value changes.
So I wrote up the following implementation, based on java.lang.String.hashCode implementation.
typedef size_t symbol;
template<typename CharT>
constexpr size_t constant_hash(const CharT* p, size_t h = 0) noexcept
{
return (*p == 0) ? h : constant_hash(p + 1, h * 31 + static_cast<size_t>(*p));
}
constexpr symbol operator "" _sym (const char* p, size_t n) noexcept
{
return constant_hash(p);
}
My question is: are there any problem with this implementation?
I'm only able to test it on GCC 4.7.1, and I'd like to know if it is standard-compliant and should also work on other compiler.
I'm asking this because a previous implementation was working on GCC but caused segfault if the binary was compiled with clang++ (problem of undefined behavior with increment operators I think).
Thanks in advance
Edit
Work with clang++ (thanks KennyTM)
There's no UB, it works fine, as long as the string has '\0'
termination. Note that constexpr
evaluation cannot invoke UB; arithmetic or pointer operations that cause UB at runtime are required to produce a compile error in the context of a constant-expression.
Note that the static_cast
is unnecessary; the char
operand will be promoted to size_t
.
Also, on first glance the hash function doesn't look very good because h * 31
is just ( h << 5 ) - h
. You might pick a larger number with 1's randomly spread out across entire the binary value. But on the other hand, they might be trying to be clever since the low 5 bits of ASCII has the most entropy and this eliminates the possibility of collisions between short strings of different length.