Search code examples
cidentifierreserved-words

Request for heeds when using '_' <underscore> as identifier in C (and interop with other languages)


In a C language library I wrote, I found most of the operations are too verbose, so I tried to make it terse - through use of macros and by reserving the _ identifier.

As I understand, identifiers beginning with _ are reserved for the C standard, the implementation(s), and libraries; but _ alone is not mentioned. I sensibly guess that it can be used by the program (i.e. the application).

This practice isn't new, $_ is used in Perl quite often, and has lots of idioms. In Python and some shell/scripting languages, _ also refer to the value of the last expression.

Of course, I won't be interfacing scripting languages directly. What need to be taken care and note of when using a lexically-local _ identifier in C? and in some other system programming languages (e.g. Rust, C++)? What kind of interoperability problems can arise from this?


Solution

  • What the C standard says specifically in C17 7.1.3 is this:

    • All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use, except those identifiers which are lexically identical to keywords.
    • All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.

    In plain English, this means that:

    • An identifier starting with __ or _A (any uppercase letter) are reserved for the compiler and the compiler's standard library (details here: What's the meaning of "reserved for any use"?). So if you use such identifiers, you might get naming collisions with the standard library.
    • Any identifier starting with _ is reserved for use when you declare anything outside a function (including when you declare a function). "Tag name spaces" refers to struct/union/enum tags declared outside of a function.

    So this means that you shouldn't write code such as:

    // BAD
    int _;
    void f (void); // this function here to illustrate that _ is at file scope
    

    or

    // BAD
    void _ (void);
    

    or

    // BAD
    struct _ {/* ... */};
    void f (void);  // this function here to illustrate that _ is at file scope
    

    or you may get naming collisions. You may however write code like

    // OK
    void f (void)
    {
      int _ = something;
    }
    

    or

    // OK
    #define x(_) _
    
    int main() {
      int _ = x(5);
      printf("%d",_);
    }
    

    The macro parameter is not a file scope identifier but expanded and resolved inside main(). Naming a macro _ would be a problem though, since that puts _ in the ordinary name space at file scope (macro name identifiers are always visible at file scope).


    That being said, naming a variable _ for any purpose is an awful idea. If you find some code too verbose then surely you can at least come up with a 3 letter identifier that's far more descriptive? val, cnt, fun, adr, tmp etc.

    I've only ever seen the identifier _ used in the context of code golf, code obfuscation or "posing" of one's knowledge of meaningless exotic C features. So if you are truly contemplating of using _, know that you find yourself in one of these categories. And that's not flattering for one's code - to be butchered at the next code review.