Search code examples
compiler-construction

C++ compilation order of declarations


I am confused with the following code:

#include <iostream>

class foo
{
public:
    foo(int _i)
    {
        this->i = _i;
    }

    void print()
    {
        std::cout << i << std::endl;
    }

private:
    int i;
};


int main()
{
    foo f(5);
    f.print();
}

How does this compile? The integer 'i' is declared after it is being used in the constructor/print methods, yet the code is successfully compiled. My understanding was that the C++ AST is generated as the file is parsed; since C++ compilers are supposed to have a look ahead of 1 token, we should not know 'i' is a valid member until the parser is well past 1 token. I clearly have a misunderstanding here.

How is the compiler able to compile this? Does it simply skip the function definitions, and parse those afterwards?


Solution

  • Parsing C++ is a much more complex task than can be summarised by "a look ahead of one token". There are many valid C++ constructs which require arbitrary lookahead (or backtracking, which is effectively the same thing).

    But that's not what's at play here.

    The compiler can produce an AST without knowing what i resolves to. It's clear from the lexical structure that i is an identifier. All the compiler needs to know is whether it identifies an object (or a function). If the name identifies a type, that would affect the parse.

    C++ allows class members to be declared in any order, except that it must always be clear whether a name is a type. Members that name a type must be declared before they are used. Names that identify a type in an outer scope must be redeclared before use if the redeclaration is not a type member.

    So the compiler can assume that an undeclared identifier names an object, not a type, and that a name which is currently a type will continue to be a type. Which object or type it names can be determined later, usually at the end of the clàss declaration.

    There is a similar requirement on names declared to be templates.

    Name resolution in C++ is phenomenally complicated. But it's (mostly) semantic. Aside from the fact that types and objects are in the same namespace, name resolution doesn't affect the parse. (Of course, if a name is used in function call syntax, and it turns out not to be a member function, that's an error, just as a use of a function name with incorrect arguments. But it's not a syntax error.)