Search code examples
clangabstract-syntax-treeclang-tidy

How does clang check redefinitions?


I'm new to Clang, and trying to write some clang-tidy checks. I want to find something that works as a "variable table", to check if some names are well-formed.

My intuition is like this:

To write redefinition code will sometimes cause an error, which is thrown out by Clang's diagnostics. like:

int main(){
int x;
int x; // error: redefinition
return 0;
}

From my perspective, clang may keep a dynamic variable table to check whether a new definition is compatible/overloading/error.

I tried to dive into clang source code and explored something:

  • Identifiertable, is kept by preprocessor, which marks all the identifiers, but does not do the semantic legal checking.
  • DeclContext, which seems to be an interface for users to use, a product produced by semantic checking.

My question is :

  • How Clang do the legal checking?
  • Am I able to get the variable table(If there exists such kind of things)?
  • If I cannot get such things, how could I know which variables are reachable from a location?

Thanks for your suggestions!


Solution

  • TLDR; see Answers below.


    Discussion

    All of your questions are related to one term of C standard, identifier, in C99-6.2.1-p1:

    An identifier can denote an object; a function; a tag or a member of a structure, union, or enumeration; a typedef name; a label name; a macro name; or a macro parameter.

    Each identifier has its own scope, one of the following, according to C99-6.2.1-p2:

    For each different entity that an identifier designates, the identifier is visible (i.e., can be used) only within a region of program text called its scope.

    Since what you are interested in are the variables inside a function (i.e., int x), then it should then obtain a block scope.

    There is an process called linkage for the identifiers in the same scope, according to C99-6.2.2-p2:

    An identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage.

    This is exactly the one that put a constraint that there should be only one identifier for one same object, or in your saying, definition legally checking. Therefore compiling the following codes

    /* file_1.c */
    int a = 123;
    
    /* file_2.c */
    int a = 456;
    

    would cause an linkage error:

    % clang file_*
    ...
    ld: 1 duplicate symbol
    clang: error: linker command failed with exit code 1
    

    However, in your case, the identifiers are inside the same function body, which is more likely the following:

    /* file.c */
    int main(){
      int b;
      int b=1;
    }
    

    Here identifier b has a block scope, which shall have no linkage, according to C99-6.2.2-p6:

    The following identifiers have no linkage: an identifier declared to be anything other than an object or a function; an identifier declared to be a function parameter; a block scope identifier for an object declared without the storage-class specifier extern.

    Having no linkage means that we cannot apply the rules mentioned above to it, that is, it should not be related to a linkage error kind.

    It is naturally considered it as an error of redefinition. But, while it is indeed defined in C++, which is called One Definition Rule, it is NOT in C.(check this or this for more details) There is no exact definition for dealing with those duplicate identifiers in a same block scope. Hence it is an implementation-defined behavior. This might be the reason why with clang, the resulting errors after compiling the above codes (file.c) differs from the ones by gcc, as shown below:
    (note that the term 'with no linkage' by gcc)

    # ---
    # GCC (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))
    # ---
    $ gcc file.c 
    file.c: In function ‘main’:
    file.c:4:6: error: redeclaration of ‘b’ with no linkage
      int b=1;
          ^
    file.c:3:6: note: previous declaration of ‘b’ was here
      int b;
          ^
    
    
    # ---
    # CLANG  (Apple clang version 13.0.0 (clang-1300.0.29.3))
    # ---
    % clang file.c 
    file.c:4:6: error: redefinition of 'b'
     int b;
         ^
    file.c:3:6: note: previous definition is here
     int b=1;
         ^
    1 error generated.
    
    

    Answers

    With all things above, I think it suffices to answer your questions:

    How clang perform the definition legally checking?

    For global variables, either clang or gcc would follow the C standard rules, that is to say, they handle the so-called "redefinition errors" by the process called Linkage. For local variables, it is undefined behavior, or more precisely, implementation-defined behavior.

    In fact, They both view the "redefinition" as an error. Although variable names inside a function body would be vanished after compiled (you can verify this in the assembly output), it is undoubtedly more natural and helpful for letting them be unique.

    Am I able to get the variable table(If there exists such kind of things)?

    Having not so much knowledge about clang internals, but according to the standards quoted above, along with an analysis of compiling, we can infer that IdentifierTable might not much fit your needs, since it exists in "preprocessing" stage, which is before "linking" stage. To take a look how clang compiler deals with duplicate variables (or more formally, symbols), and how to store them, you might want to check the whole project of lld, or in particular, SymbolTable.