I wonder why CMS needs two phases (and so two pauses) of marks: i.e. initial mark and remark. Can we simply do one mark and then perform sweep? I imagine this can be a faster pause. Can someone help explain what is the main purpose of the second mark and why we need it? Thanks!
This is very nicely explained in the HotSpot Memory Management Whitepaper:
A collection cycle for the CMS collector starts with a short pause, called the initial mark, that identifies the initial set of live objects directly reachable from the application code. Then, during the concurrent marking phase, the collector marks all live objects that are transitively reachable from this set. Because the application is running and updating reference fields while the marking phase is taking place, not all live objects are guaranteed to be marked at the end of the concurrent marking phase. To handle this, the application stops again for a second pause, called remark, which finalizes marking by revisiting any objects that were modified during the concurrent marking phase. Because the remark pause is more substantial than the initial mark, multiple threads are run in parallel to increase its efficiency. At the end of the remark phase, all live objects in the heap are guaranteed to have been marked, so the subsequent concurrent sweep phase reclaims all the garbage that has been identified.