Search code examples
javascriptgarbage-collectionmark-and-sweep

What is the result of mark and sweep in this code?


What is the result of mark and sweep in this code :

var user  = "mina";
var user = null;
console.log(user);

If I implement mark and sweep, var user = "mina" is seeped to garbage because it is not accessible any more. Is that correct?


Solution

  • Disclaimer: I worked on the Chakra JS engine while at Microsoft.

    First: your question presupposes that ECMAScript (the specification for JavaScript) requires implementations to use a Mark-and-Sweep Garbage Collector, on the contrary: it does not:

    The ECMAScript specification does not require Garbage Collection, indeed an implementation that never frees memory and eventually cannot allocate any more and crashes horribly is still a valid implementation according to my interpretation of the spec; and even when language specifications do require GC (like the Common Language Specification for .NET) they don’t stipulate that it must use the Mark-and-Sweep strategy.

    Historical side-note: I believe the Active Scripting engine for JScript and VBScript (which predates Chakra's JIT engine) used Reference Counting for COM objects and environment intrinsics while JavaScript values (strings and JS objects) used some form of mark-and-sweep but I might be wrong as I never worked on that engine. Chakra uses more advanced techniques, including passing small strings by-value, and I won’t go into detail here, but Chakra's source code is available on GitHub: https://github.com/Microsoft/ChakraCore

    With regard to your specific question of using string values: as string literals are immutable they tend to be interned by the script parser before the code is even compiled or interpreted so when a string loses a reference its string data (the character array) won’t be collected or reallocated but simply stay in memory. If it were a runtime-created string then it would be deallocated eventually and not necessarily at the exact moment that reference is lost (as with trivial Reference Counting implementations).

    With regard to your exact code: Because the “mina” value is never actually used a decent code optimiser would remove the line entirely so there wouldn’t be anything to free (assuming the string wasn’t interned).

    Disregarding those technicalities "what happens to an object after it loses all references" then that depends on the type of GC or automatic memory management used by the JavaScript engine:

    Let's use this example:

    function doSomething() {
    
        var nihilist = {
            name: 'Friedrich Nietzsche',
            dob: new Date( '1844-10-15' )
        };
    
        console.log( nihilist.name );
    
        nihilist = null;
    }
    

    Different JavaScript or ECMAScript engines are free to implement memory-management however they see fit (strictly speaking, including doing nothing at all). So let's consider what happens in some common scenarios:

    Stack allocation

    A smart compiler that analyses the lifetime of objects and their references (and knows console.log has no side-effects) would see that the object created for nihilist is never exposed by the function (i.e. it isn't return'd, it isn't assigned to some kind of output parameter, there is no hidden async state-machine or closure capture of the object's only reference), so it wouldn't need to allocate the nihilist object in a free-store (regardless of if it's a heap, an arena, etc) and could put it on the call-stack, so when nihilist = null is reassigned the original object value still exists on the stack but it will be freed when doSomething returns (assuming console.log didn't store a reference to the name string, of course).

    1. The nihilist object X is created in the stack.
    2. X is assigned to the nihilist reference.
    3. When nihilist = null is assigned, nothing happens to X (other than losing its last reference).
    4. doSomething returns and the stack-pointer moves to the previous stack-frame and the memory that did contain X will be overwritten when or if the call-stack has more frames pushed onto it (e.g. by another function call).

    Reference Counting

    1. A new object X is created in a free-store. Its reference count is zero. The inner name string value has its reference count set to 1 (assuming it is not interned).
    2. X is assigned to the nihilist reference and its count is incremented to 1.
    3. console.log is called passing nihilist.name, which increases Y's reference count to 2 and then back to 1 when console.log returns (assuming console.log made no new references to it).
    4. nihilist = null is assigned and X's reference count drops to zero.
    5. Generally a reference-counting system will immediately deallocate an object's memory immediately whenever its count drops to zero (though a system might defer it for some reason, however, and is beyond the scope of my answer), so immediately after nihilist = null the memory is deallocated.

    Tracing Garbage Collection

    There are different ways to implement a Tracing collector - and one of those strategies is the naive mark-and-sweep strategy you mentioned:

    An important point to remember about Tracing Garbage Collection is that the runtime needs both some way of already knowing where allocated objects are in memory, and a way to follow references between objects. This is easier in JavaScript where references are not simple 32 or 64-bit raw pointers but more likely rather large C struct objects containing lots of metadata - and all object allocations can be stored in an "object table" for simple iteration (this is called "precise garbage collection" or "exact garbage collection"); other approaches involve heuristically scanning through raw memory to look for values that look like pointers.

    Another important thing to note is that Tracing GCs generally don't run at specific points in the program or are directly invoked by the program, instead the GC runs in a background thread and freezes program execution whenever it wants (this is called "stopping the world" and is usually in response to increased memory usage and possibly at timer intervals too) and then performs its collection and only resumes the other threads when it's done. This happens unpredictably and is why Tracing GC systems cannot be used in a Hard Real-time environment.

    In this case, we'll assume our example Tracing GC JavaScript environment uses Precise Garbage Collection (I note that Chakra primarily uses memory-scanning techniques).

    1. The new object X is created in a free-store. Its memory address and size is added to a list of known objects within the runtime. 1.1. It has a reference to the name string as well (which we assume is an interned immutable string Y). 1.2. The dob: new Date object would be stored by-value inside X (in raw-memory there's a "tag" that tells the rutime that it's a Date stored by-value, but it could be changed to a Date` reference later on.
    2. After the var nihilist = X assignment, X becomes associated with a special GC root representing function local variables (as variables themselves aren't objects), i.e. "X is reachable". If X was referenced by another object Z, then that object would be referenced by the root and X would be 2 degrees-of-separation but still reachable.
    3. Inside console.log a temporary reference will be made to Y which ends when console.log returns. Because console.log does not refer to X it means that if console.log did make a long-lived reference to Y that X can still safely be destroyed.
    4. When nihilist = null is assigned, then X is no-longer reachable, but nothing will happen immediately: the memory that X occupies and the allocation metadata about X remains unchanged.
    5. At some point in the future (which could be immediately or could be minutes or even hours away) the GC will freeze program execution and start its mark-and-sweep:

      5.1. First, it iterates through its root objects (including the special root representing local variables) and annotate them as being still-alive (the memory that stores the annotation could be in-place (e.g. each object has a metadata header storing its dead/alive state) or it could be inside the known-object-list mentioned in step 1), e.g.:

          function checkObject( allocatedObject ) {
      
              if( allocatedObject.status == UNKNOWN ) {
                  allocatedObject.status = ALIVE;
      
                  foreach( reference in allocatedObject.references ) {
      
                      reference.destination.status == ALIVE;
      
                      checkObject( reference.destination );
                  }
              }
          }
      
          foreach( root in allRoots ) {
      
              foreach( reference in root.references ) {
      
                  checkObject( reference.destination );
              }
          } 
      

      5.2. Then it goes through the list of non-root objects (allNonRootObjects) and checks to see if they're alive or not:

          foreach( allocatedObject in allNonRootObjects ) {
      
              if( allocatedObject.status == UNKNOWN ) {
      
                  deallocate( allocatedObject );
              }
          }