Search code examples
compiler-constructiongarbage-collection

Why garbage collection? Why not compilers auto-insert free() instead?


Rather than running garbage-detection periodically at run time, why don't we make compilers automatically insert free() at appropriate places? That way, we pay the price only once at compile-time.

The compiler knows the places where a variable goes out of scope or gets reassigned to different object. So, it can then find out if the object is no longer reachable, and insert free() automatically there.

Can't it? Why?

If it's because of multithreading, can we do it with single-threaded/green-threaded languages then?


Solution

  • The compiler knows the places where a variable goes out of scope or gets reassigned to different object.

    Sure it does - for variables. But you do not clear variables - you clear the memory they are pointing to. And just because a variable went out of scope, it does not mean that the memory pointed to is no longer reachable.

    For example:

    y = ...
    {
      x = new X();
      if (todayIsTuesday()) {
        y = x;
      }
    } // x just went out of scope
    

    You can't make a compile-time decision whether the memory pointed to by x should be freed at the last line of that segment, because it depends on what day of the week it is when that code is ran.

    So to solve this, this decision must be delegated to run-time, by inserting proper logic, e.g.:

    Y* y = ...
    {
      X* mem = new X();
      X* x = mem;
      markPointer(&x, mem);
      if (todayIsTuesday()) {
        y = x;
        markPointer(&y, mem);
      }
      markNoLongerPointer(&x, mem);
    } // x just went out of scope
    

    With markNoLongerPointer() clearing the memory given as the 2nd argument if its internally-maintained data structure told it that x is the only reference to that memory... In other words, this is the crude starting point to a reference-counting logic.

    Compilers certainly can add such reference-counting logic to the compiled code, and some do, but as others mentioned, reference counting has some disadvantages: high overhead, the problem of cycles, plus it can sometimes cause significant pauses at inconvenient times, when the only reference to a root of a large data structure goes out of scope. There are ways to address some of these disadvantages, though, but that's out of scope for this answer :-)