There are a number of garbage collection libraries for C++.
I am kind of confused how the pointer tracking works.
In particular, suppose we have a base pointer P and a list of other pointers who are computed as offsets from P using an array.
Ex,
P2 = P+offset[0]
How does the garbage collector know P2 is still in scope? It has no direct reference but it's still accessible.
Probably the most popular C++ gc is
https://en.m.wikipedia.org/wiki/Boehm_garbage_collector
But following their example syntax it seems very easy to break so I must not be understanding something.
This question cannot be answered in general. There are different systems that may be regarded as garbage collection for C++; for example, Herb Sutter's deferred_ptr is basically a garbage collecting smart pointer. I've personally implemented another version of this idea, similar to Sutter's but less fancy.
I can answer about Boehm, however. How the Boehm garbage collector recognizes pointers when it does its "mark" phase, is basically by scanning memory and assuming that things that look like pointers are pointers.
The garbage collector knows all the areas of memory where user data is and it knows all of the pointers that it has allocated and how big those allocations were. It just looks for chains of pointers starting from "root segments" defined as below, where by "look" we mean explicitly scanning memory for 64 bit values that are the same as one of the GC allocations it has done.
From here:
Since it cannot generally tell where pointer variables are located, it scans the following root segments for pointers:
- The registers. Depending on the architecture, this may be done using assembly code, or by calling a setjmp-like function which saves register contents on the stack.
- The stack(s). In the case of a single-threaded application, on most platforms this is done by scanning the memory between (an
approximation of) the current stack pointer and GC_stackbottom. (For
Itanium, the register stack scanned separately.) The GC_stackbottom
variable is set in a highly platform-specific way depending on the
appropriate configuration information in gcconfig.h. Note that the
currently active stack needs to be scanned carefully, since
callee-save registers of client code may appear inside collector
stack frames, which may change during the mark process. This is
addressed by scanning some sections of the stack "eagerly",
effectively capturing a snapshot at one point in time.- Static data region(s). In the simplest case, this is the region between DATASTART and DATAEND, as defined in gcconfig.h. However, in
most cases, this will also involve static data regions associated
with dynamic libraries. These are identified by the mostly
platform-specific code in dyn_load.c.
The address space for 64-bit pointers is huge so false positives will be rare, but even if they occur, false positives would just be leaks, that last as long as there happens to be some other variable in the memory the mark phase scans that is exactly the same value as some 64-bit pointer that was allocated by the garbage collector.