Search code examples
c++c++14string-literals

Check whether equal string literals are stored at the same address


I am developing a (C++) library that uses unordered containers. These require a hasher (usually a specialization of the template structure std::hash) for the types of the elements they store. In my case, those elements are classes that encapsulate string literals, similar to conststr of the example at the bottom of this page. The STL offers an specialization for constant char pointers, which, however, only computes pointers, as explained here, in the 'Notes' section:

There is no specialization for C strings. std::hash<const char*> produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

Although this is very fast (or so I think), it is not guaranteed by the C++ standard whether several equal string literals are stored at the same address, as explained in this question. If they aren't, the first condition of hashers wouldn't be met:

For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2)

I would like to selectively compute the hash using the provided specialization, if the aforementioned guarantee is given, or some other algorithm otherwise. Although resorting back to asking those who include my headers or build my library to define a particular macro is feasible, an implementation defined one would be preferable.

Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

An example:

#ifdef __GXX_SAME_STRING_LITERALS_SAME_ADDRESS__
const char str1[] = "abc";
const char str2[] = "abc";
assert( str1 == str2 );
#endif

Solution

  • Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

    Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

    This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

    Enabled at levels -O, -O2, -O3, -Os.

    • Visual Studio has String Pooling (/GF option : "Eliminate Duplicate Strings")

    String pooling allows what were intended as multiple pointers to multiple buffers to be multiple pointers to a single buffer. In the following code, s and t are initialized with the same string. String pooling causes them to point to the same memory:

    char *s = "This is a character buffer";
    char *t = "This is a character buffer";
    

    Note: although MSDN uses char* strings literals, const char* should be used

    • clang apparently also has the -fmerge-constants option, but I can't find much about it, except in the --help section, so I'm not sure if it really is the equivalent of the gcc's one :

    Disallow merging of constants


    Anyway, how string literals are stored is implementation dependent (many do store them in the read-only portion of the program).

    Rather than building your library on possible implementation-dependent hacks, I can only suggest the usage of std::string instead of C-style strings : they will behave exactly as you expect.

    You can construct your std::string in-place in your containers with the emplace() methods :

        std::unordered_set<std::string> my_set;
        my_set.emplace("Hello");