Search code examples
c++gcccompilationclangforward-declaration

How does forward declaration save compile time?


If you read online then there is plenty of claims that in C++ if you use forward declaration then it saves your compile time. The usual theory is that since #include means mere text replacement if I use forward declaration, then my compiler doesn't need to parse the header and possibly compile it, so it saves time. I found this claim hard to believe because consider I usually see code like this:

// B.h

class A;

class B {
public:
void doSomething(A& a);
}

In this case, yeah, we don't need to include A.h in B.h as we forward declared it, but the problem is that in B.cpp eventually, we need a full type A to use its methods and data members. So I found in nearly all cases, we need to include A.h in B.cpp.

So how does forward declaration actually save compile time? I see people with benchmarks to prove that if they use forward declaration instead of #includes, the compile time actually goes down, so there must be something I do not understand here...


I know saving compile time is not the sole purpose of forward declaration, I understand it has other purposes. I just want to understand why some people claim it can save compile time.


Solution

  • Compile times

    the problem is that in B.cpp eventually, we need a full type A to use its methods and data members.

    Yes, that is a typical pattern. Forward declare a class (e.g. A) in a header (e.g. B.h), then in the source code corresponding to that header (B.cpp), include the header for the forward-declared class (A.h).

    So I found in nearly all cases, we need to include B.h in B.cpp.

    Correct, forward declarations do not save time when compiling the corresponding source code. The savings come when compiling other source code that uses B. For example:

    other.cpp

    #include "B.h"
    
    // Do stuff with `B` objects.
    // Make no use of `A` objects.
    

    Assume this file does not need definitions from A.h. This is where the savings come in. When compiling other.cpp, if B.h uses a forward declaration of A, there is no need to process A.h. Nor is there a need to process the headers that A.h itself includes, and so on. Now multiply this effect by the number of files that include B.h, either directly or indirectly.

    Note that there is a compounding effect here. The number of "headers that A.h itself includes" and of "files that include B.h" would be the numbers before replacing any #include statements with forward declarations. (Once you start making these replacements, the numbers come down.)

    How much of an effect? Not as much as there used to be. Still, as long as we're talking theoretically, even the smallest savings is still a savings.

    Rebuild times

    Instead of raw compile times (build everything), I think a better focus would be on rebuild times. That is, the time it takes to compile just the files affected by a change you made.

    Suppose there are ten files that rely on B.h but not on A.h. If B.h were to include A.h, then those ten files would be affected by changes to A.h. If B.h were instead to forward declare A, then those files would not be affected by changes to A.h, reducing the time to rebuild after those changes.

    Now suppose there is another class, call it B2, that also has the option to forward declare A instead of including the header. Maybe there are another ten files that depend on B2 but not on B and not on A. Now there are a twenty files that do not need to be re-compiled after changes to A.

    But why stop there? Let's add B3 through B10 to the mix. Now there are a hundred files that do not need to be re-compiled after changes to A.

    Add another layer. Suppose there is a C.h that has the option to forward declare B instead of including B.h. By using a forward declarations, changes to A.h no longer require re-compiling the ten files that use C.h. And, of course, we'll assume there are ten such files for each of B through B10. Now we're up to 10*10*10 files that do not need to be recompiled when A.h changes.

    Takeaway

    This is a simplified example to serve as a demonstration. The point is that there is a forest of dependency trees created by #include lines. (The root of such a tree would be the header file of interest, and its children are the files that #include it.) Each leaf in one of these trees represents a file that must be compiled when changes occur in the header file of interest. The number of leaves in a tree grows exponentially with the depth, so removing a branch (by replacing an #include with a forward declaration) can have a massive effect on rebuild time. Or maybe a negligible effect. This is theory, not practice.


    I should note that like the question, this answer focuses on compile times, not on the other factors to consider. This is not supposed to be a comprehensive guide to the pros and cons of forward declarations, just an explanation for how they could save compilation time.