Search code examples
c++cmicro-optimizationcompiler-theorystring-interning

Duplicate literals and hard-coding


I see the follow pattern occurring quite frequently:

 b->last = ngx_cpymem(b->last, "</pre><hr>", sizeof("</pre><hr>") - 1);

Notice that the literal string is used twice. The extract is from the nginx source-base.

The compiler should be able to merge these literals when it is encountered within the compilation unit.

My questions are:

  1. Do the commercial-grade compilers(VC++, GCC, LLVM/Clang) remove this redundancy when encountered within a compilation unit ?
  2. Does the (static) linker remove such redundancies when linking object files.
  3. if 2 applies would this optimization occur during dynamic linking ?
  4. If 1 and 2 apply, do they apply to all literals.

These questions are important because it allows a programmer to be verbose without losing efficiency -- i.e., think about enormous static data models being hard-wired into a program (for example the rules of a Decision Support System used in some low-level scenario).

Edit

2 points / clarifications

  1. The code above is written by a recognised "master" programmer. The guy single handedly wrote nginx.

  2. I have not asked which of the possible mechanisms of literal hard-coding is better. Therefore don't go off-topic.

Edit 2

My original example was quite contrived and restrictive. The following snippet shows the usage of string literals being embedded into internal hard-coded knowledge. The first snippet is meant for the config parser telling it what enum values to set for which string, and the second to be used more generally as a string in the program. Personally I am happy with this as long as the compiler uses one copy of the string literal, and since the elements are static, they don't enter the global symbol tables.

static ngx_conf_bitmask_t  ngx_http_gzip_proxied_mask[] = {
   { ngx_string("off"), NGX_HTTP_GZIP_PROXIED_OFF },
   { ngx_string("expired"), NGX_HTTP_GZIP_PROXIED_EXPIRED },
   { ngx_string("no-cache"), NGX_HTTP_GZIP_PROXIED_NO_CACHE },
   { ngx_string("no-store"), NGX_HTTP_GZIP_PROXIED_NO_STORE },
   { ngx_string("private"), NGX_HTTP_GZIP_PROXIED_PRIVATE },
   { ngx_string("no_last_modified"), NGX_HTTP_GZIP_PROXIED_NO_LM },
   { ngx_string("no_etag"), NGX_HTTP_GZIP_PROXIED_NO_ETAG },
   { ngx_string("auth"), NGX_HTTP_GZIP_PROXIED_AUTH },
   { ngx_string("any"), NGX_HTTP_GZIP_PROXIED_ANY },
   { ngx_null_string, 0 }
};

followed closely by:

static ngx_str_t  ngx_http_gzip_no_cache = ngx_string("no-cache");
static ngx_str_t  ngx_http_gzip_no_store = ngx_string("no-store");
static ngx_str_t  ngx_http_gzip_private = ngx_string("private");

To those that stayed on topic, bravo !


Solution

  • Note that for the specific case of sizeof("</pre><hr>"), it is virtually certain that the string literal will never appear in the output file - the entire sizeof expression can be evaluated to the integer constant 11 at compile-time.

    Notwithstanding, it is still a very common optimisation for compilers to merge identical string literals.