Search code examples
c++cpointerscompiler-construction

How does compiler understand the declaration of pointers and arrays


The question comes to my mind when I was confused by pointer-to-arrays and arrays-of-pointers though I've figured out how to distinguish them now, literally.

But I'm still puzzled with how compiler understand the declaration of pointers. For example, is there any type like int*?

int *p;   // a common pointer declaration

Specifically, how does compiler understand the former sentence? Treat p as a pointer first and then find the object pointed is int? Or finding user declared a pointer-to-int int* named p?

It's more confusing to me when facing pointer-to-arrays.

int (*p)[4]   //a pointer to an array of int[4]

How does understand this? Does compiler treat this as int[4] *p(int[4] works like a new type like how we do in containers)? Similar questions in following case.

int *p[4]    //an array-of-pointers

As [] is prior to *, does compiler understand p[4] first and treat p as an array (with unknown elements type), then specify the elements type to int*?


Solution

  • The parsing of declarations can be found in section 6.7.6 of the current standard. It's too big to go over in full, but in brief the rules about the type of p are laid out in inductive manner. In these rules T is a plain type (no pointers/arrays etc.) and D is a declarator:

    1. T * D is defined to mean that if the type of D in T D would be "something of/to T", then the type of D is "something of pointer to T".

    2. T D[N] (where N could be blank or various other things) is defined to mean that if the type of D in T D would be "something of T", then the type of D is "something of/to array (of dimension N) of T".

    So you can see each rule modifies the result of a previous application of the rule, until we get down to the "end" of the induction which happens when D is a plain identifier.

    Also, T ( D ) means T D other than enforcing parsing.

    Some sources describe declarations as being read "inside out", although this induction chain actually occurs "outside in", but you need to go in and then backtrack with your results. The way we may be taught to read as humans differs from the language definition, although the end result is the same.


    To use one of your examples, int *p[4]:

    • Rule 1 - this has the form T * D, with T = int and D = p[4]. Since T D would be "array[4] of int" (see below), then T * D is "array[4] of pointer to int".

    In this step we analyzed int q[4]:

    • Rule 2 - this has the form T D[N], with T = int and D = q. This is the "end case" because D is a plain identifier, so the type of q at this step is "array[4] of int".

    For another example, int (*p)[4]:

    • Rule 2 - this is of the form T D[N], where T = int, and D = (*p). Since T D would be "pointer to int", then T D[4] is "pointer to array[4] of int".