Intro:
At university one learns that typical garbage collection roots in Java (and similar languages) are static variables of loaded classes, thread-local variables of currently running threads, "external references" such as JNI handles, and GC-specifics such as old-to-young pointers during Minor GCs of a generational garbage collector. In theory, this does not sound that hard.
Problem:
I am reading the HotSpot source code and was interested in how these garbage collection roots are detected inside the VM, i.e., which methods are used inside the JVM source code to visit all roots.
Investigation:
I found various files (e.g., psMarkSweep.cpp
), belonging to various GC implementations, that contain very similar constructs.
Following is the method PSMarkSweep::mark_sweep_phase1
method from psMarkSweep.cpp
that I think covers the strong roots:
ParallelScavengeHeap::ParStrongRootsScope psrs;
Universe::oops_do(mark_and_push_closure());
JNIHandles::oops_do(mark_and_push_closure()); // Global (strong) JNI handles
CLDToOopClosure mark_and_push_from_cld(mark_and_push_closure());
MarkingCodeBlobClosure each_active_code_blob(mark_and_push_closure(), !CodeBlobToOopClosure::FixRelocations);
Threads::oops_do(mark_and_push_closure(), &mark_and_push_from_cld, &each_active_code_blob);
ObjectSynchronizer::oops_do(mark_and_push_closure());
FlatProfiler::oops_do(mark_and_push_closure());
Management::oops_do(mark_and_push_closure());
JvmtiExport::oops_do(mark_and_push_closure());
SystemDictionary::always_strong_oops_do(mark_and_push_closure());
ClassLoaderDataGraph::always_strong_cld_do(follow_cld_closure());
// Do not treat nmethods as strong roots for mark/sweep, since we can unload them.
//CodeCache::scavenge_root_nmethods_do(CodeBlobToOopClosure(mark_and_push_closure()));
The following code from psScavenge.cpp
seems to add tasks for the different types of GC roots:
if (!old_gen->object_space()->is_empty()) {
// There are only old-to-young pointers if there are objects
// in the old gen.
uint stripe_total = active_workers;
for(uint i=0; i < stripe_total; i++) {
q->enqueue(new OldToYoungRootsTask(old_gen, old_top, i, stripe_total));
}
}
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::universe));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::jni_handles));
// We scan the thread roots in parallel
Threads::create_thread_roots_tasks(q);
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::object_synchronizer));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::flat_profiler));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::management));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::system_dictionary));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::class_loader_data));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::jvmti));
q->enqueue(new ScavengeRootsTask(ScavengeRootsTask::code_cache));
Looking at ScavangeRootsTask
, we see familiar code similar to the code in psMarkSweep
:
void ScavengeRootsTask::do_it(GCTaskManager* manager, uint which) {
assert(Universe::heap()->is_gc_active(), "called outside gc");
PSPromotionManager* pm = PSPromotionManager::gc_thread_promotion_manager(which);
PSScavengeRootsClosure roots_closure(pm);
PSPromoteRootsClosure roots_to_old_closure(pm);
switch (_root_type) {
case universe:
Universe::oops_do(&roots_closure);
break;
case jni_handles:
JNIHandles::oops_do(&roots_closure);
break;
case threads:
{
ResourceMark rm;
CLDClosure* cld_closure = NULL; // Not needed. All CLDs are already visited.
Threads::oops_do(&roots_closure, cld_closure, NULL);
}
break;
case object_synchronizer:
ObjectSynchronizer::oops_do(&roots_closure);
break;
case flat_profiler:
FlatProfiler::oops_do(&roots_closure);
break;
case system_dictionary:
SystemDictionary::oops_do(&roots_closure);
break;
case class_loader_data:
{
PSScavengeKlassClosure klass_closure(pm);
ClassLoaderDataGraph::oops_do(&roots_closure, &klass_closure, false);
}
break;
case management:
Management::oops_do(&roots_closure);
break;
case jvmti:
JvmtiExport::oops_do(&roots_closure);
break;
case code_cache:
{
MarkingCodeBlobClosure each_scavengable_code_blob(&roots_to_old_closure, CodeBlobToOopClosure::FixRelocations);
CodeCache::scavenge_root_nmethods_do(&each_scavengable_code_blob);
}
break;
default:
fatal("Unknown root type");
}
// Do the real work
pm->drain_stacks(false);
}
Insight:
This lists of GC roots in the source code look quite a bit larger than what I initially wrote in my first sentence, so I try to list them below, with some comments:
psMarkSweep.cpp
uses a CLDToOopClosure and does something with Code blobs, while psScavange.cpp
does not. Afaik, CLD stands for Class loader data, but I don't know why it is used in one case, but not in another. The same accounts for code blobs.psMarkSweep.cpp
) and sometimes visited using the CodeCache (as done in psScavenge.cpp
).Questions:
While many things can be found in the source code, I still struggle to understand some of these GC roots, or how these GC roots are found.
CLDToOopClosure
and a MarkingCodeBlobClosure
in combination with Threads::oops_do
(as done in psMarkSweep.cpp
), compared to visiting the threads with an oop closure and additionally executing ClassLoaderDataGraph::oops_do
and CodeCache::scavenge_root_nmethods_do
(as used by psScavenge.cpp
).Remark:
I know this is a veeeery long question with various subquestions, but I think it would have been hard to split it into multiple ones. I appreciate every posted answer, even if it does not cover an answer to all the questions asked above, even answers to parts of them will help me. Thanks!
As you've already discovered yourself, <Subsystem>::oops_do()
is a typical mechanism in HotSpot JVM to visit GC roots of <Subsystem>
. Good analysis, by the way. Just keep going through VM sources, and you'll find the answers, as there are plenty useful comments in the code.
Note that the purpose of oops_do
is not only to mark reachable objects, but also to process the references themselves, particularly, to relocate them during compaction.
CodeBlob is a piece of generated code. It covers not only JITted methods (aka nmethods
) but also various VM stubs and routines generated in runtime.
// CodeBlob - superclass for all entries in the CodeCache.
//
// Suptypes are:
// nmethod : Compiled Java methods (include method that calls to native code)
// RuntimeStub : Call to VM runtime methods
// DeoptimizationBlob : Used for deoptimizatation
// ExceptionBlob : Used for stack unrolling
// SafepointBlob : Used to handle illegal instruction exceptions
These pieces of code may contain embedded references to Heap objects, e.g. String/Class/MethodHandle literals and static final constants.
The purpose of CLDToOopClosure
in Threads::oops_do
is to mark objects referenced through method pointers not marked otherwise:
// The method pointer in the frame might be the only path to the method's
// klass, and the klass needs to be kept alive while executing. The GCs
// don't trace through method pointers, so typically in similar situations
// the mirror or the class loader of the klass are installed as a GC root.
// To minimze the overhead of doing that here, we ask the GC to pass down a
// closure that knows how to keep klasses alive given a ClassLoaderData.
cld_f->do_cld(m->method_holder()->class_loader_data());
Similarly, MarkingCodeBlobClosure
is used to mark objects referenced only from active nmethods:
// In cases where perm gen is collected, GC will want to mark
// oops referenced from nmethods active on thread stacks so as to
// prevent them from being collected. However, this visit should be
// restricted to certain phases of the collection only. The
// closure decides how it wants nmethods to be traced.
if (cf != NULL)
cf->do_code_blob(_cb);
Note that CodeCache::scavenge_root_nmethods_do
is not called during marking phase:
// Do not treat nmethods as strong roots for mark/sweep, since we can unload them.
//CodeCache::scavenge_root_nmethods_do(CodeBlobToOopClosure(&mark_and_push_closure));
SystemDictionary
is responsible mainly for resolving symbolic names of classes. It does not serve as a GC root for marking (except for Bootstrap and System classloaders). On the other hand, ClassLoaderDataGraph
maintains the complete linkset of class loader entities. It does serve as GC root and is responsible for class unloading.
// A ClassLoaderData identifies the full set of class types that a class
// loader's name resolution strategy produces for a given configuration of the
// class loader.
// Class types in the ClassLoaderData may be defined by from class file binaries
// provided by the class loader, or from other class loader it interacts with
// according to its name resolution strategy.
// ...
// ClassLoaderData carries information related to a linkset (e.g.,
// metaspace holding its klass definitions).
// The System Dictionary and related data structures (e.g., placeholder table,
// loader constraints table) as well as the runtime representation of classes
// only reference ClassLoaderData.
//
// Instances of java.lang.ClassLoader holds a pointer to a ClassLoaderData that
// that represent the loader's "linking domain" in the JVM.
Interned strings don't need to survive GC. They are not GC roots. It's OK to unload interned strings unreachable otherwise, and HotSpot actually does that.
A Garbage Collector does not introduce new types of roots itself, but it may use algorithms and data structures that affect the meaning of "reachability". E.g. concurrent collectors may treat all references modified between initial mark and final remark as reachable even if they are not.