What is the result of mark and sweep in this code :
var user = "mina";
var user = null;
console.log(user);
If I implement mark and sweep, var user = "mina"
is seeped to garbage because it is not accessible any more. Is that correct?
Disclaimer: I worked on the Chakra JS engine while at Microsoft.
First: your question presupposes that ECMAScript (the specification for JavaScript) requires implementations to use a Mark-and-Sweep Garbage Collector, on the contrary: it does not:
The ECMAScript specification does not require Garbage Collection, indeed an implementation that never frees memory and eventually cannot allocate any more and crashes horribly is still a valid implementation according to my interpretation of the spec; and even when language specifications do require GC (like the Common Language Specification for .NET) they don’t stipulate that it must use the Mark-and-Sweep strategy.
Historical side-note: I believe the Active Scripting engine for JScript and VBScript (which predates Chakra's JIT engine) used Reference Counting for COM objects and environment intrinsics while JavaScript values (strings and JS objects) used some form of mark-and-sweep but I might be wrong as I never worked on that engine. Chakra uses more advanced techniques, including passing small strings by-value, and I won’t go into detail here, but Chakra's source code is available on GitHub: https://github.com/Microsoft/ChakraCore
With regard to your specific question of using string
values: as string literals are immutable they tend to be interned by the script parser before the code is even compiled or interpreted so when a string loses a reference its string data (the character array) won’t be collected or reallocated but simply stay in memory. If it were a runtime-created string then it would be deallocated eventually and not necessarily at the exact moment that reference is lost (as with trivial Reference Counting implementations).
With regard to your exact code: Because the “mina”
value is never actually used a decent code optimiser would remove the line entirely so there wouldn’t be anything to free (assuming the string wasn’t interned).
Disregarding those technicalities "what happens to an object after it loses all references" then that depends on the type of GC or automatic memory management used by the JavaScript engine:
Let's use this example:
function doSomething() {
var nihilist = {
name: 'Friedrich Nietzsche',
dob: new Date( '1844-10-15' )
};
console.log( nihilist.name );
nihilist = null;
}
Different JavaScript or ECMAScript engines are free to implement memory-management however they see fit (strictly speaking, including doing nothing at all). So let's consider what happens in some common scenarios:
A smart compiler that analyses the lifetime of objects and their references (and knows console.log
has no side-effects) would see that the object created for nihilist
is never exposed by the function (i.e. it isn't return
'd, it isn't assigned to some kind of output parameter, there is no hidden async
state-machine or closure capture of the object's only reference), so it wouldn't need to allocate the nihilist
object in a free-store (regardless of if it's a heap, an arena, etc) and could put it on the call-stack, so when nihilist = null
is reassigned the original object value still exists on the stack but it will be freed when doSomething
returns (assuming console.log
didn't store a reference to the name
string, of course).
nihilist
object X
is created in the stack.X
is assigned to the nihilist
reference.nihilist = null
is assigned, nothing happens to X
(other than losing its last reference).doSomething
returns and the stack-pointer moves to the previous stack-frame and the memory that did contain X
will be overwritten when or if the call-stack has more frames pushed onto it (e.g. by another function call).X
is created in a free-store. Its reference count is zero. The inner name
string
value has its reference count set to 1 (assuming it is not interned).X
is assigned to the nihilist
reference and its count is incremented to 1
.console.log
is called passing nihilist.name
, which increases Y
's reference count to 2
and then back to 1
when console.log
returns (assuming console.log
made no new references to it).nihilist = null
is assigned and X
's reference count drops to zero.nihilist = null
the memory is deallocated.There are different ways to implement a Tracing collector - and one of those strategies is the naive mark-and-sweep strategy you mentioned:
An important point to remember about Tracing Garbage Collection is that the runtime needs both some way of already knowing where allocated objects are in memory, and a way to follow references between objects. This is easier in JavaScript where references are not simple 32 or 64-bit raw pointers but more likely rather large C struct
objects containing lots of metadata - and all object allocations can be stored in an "object table" for simple iteration (this is called "precise garbage collection" or "exact garbage collection"); other approaches involve heuristically scanning through raw memory to look for values that look like pointers.
Another important thing to note is that Tracing GCs generally don't run at specific points in the program or are directly invoked by the program, instead the GC runs in a background thread and freezes program execution whenever it wants (this is called "stopping the world" and is usually in response to increased memory usage and possibly at timer intervals too) and then performs its collection and only resumes the other threads when it's done. This happens unpredictably and is why Tracing GC systems cannot be used in a Hard Real-time environment.
In this case, we'll assume our example Tracing GC JavaScript environment uses Precise Garbage Collection (I note that Chakra primarily uses memory-scanning techniques).
X
is created in a free-store. Its memory address and size is added to a list of known objects within the runtime.
1.1. It has a reference to the name
string as well (which we assume is an interned immutable string Y
).
1.2. The dob: new Date
object would be stored by-value inside X
(in raw-memory there's a "tag" that tells the rutime that it's a Date
stored by-value, but it could be changed to a Date` reference later on.var nihilist = X
assignment, X
becomes associated with a special GC root representing function local variables (as variables themselves aren't objects), i.e. "X
is reachable". If X
was referenced by another object Z
, then that object would be referenced by the root and X
would be 2 degrees-of-separation but still reachable.console.log
a temporary reference will be made to Y
which ends when console.log
returns. Because console.log
does not refer to X
it means that if console.log
did make a long-lived reference to Y
that X
can still safely be destroyed.nihilist = null
is assigned, then X
is no-longer reachable, but nothing will happen immediately: the memory that X
occupies and the allocation metadata about X remains unchanged.At some point in the future (which could be immediately or could be minutes or even hours away) the GC will freeze program execution and start its mark-and-sweep:
5.1. First, it iterates through its root objects (including the special root representing local variables) and annotate them as being still-alive (the memory that stores the annotation could be in-place (e.g. each object has a metadata header storing its dead/alive state) or it could be inside the known-object-list mentioned in step 1), e.g.:
function checkObject( allocatedObject ) {
if( allocatedObject.status == UNKNOWN ) {
allocatedObject.status = ALIVE;
foreach( reference in allocatedObject.references ) {
reference.destination.status == ALIVE;
checkObject( reference.destination );
}
}
}
foreach( root in allRoots ) {
foreach( reference in root.references ) {
checkObject( reference.destination );
}
}
5.2. Then it goes through the list of non-root objects (allNonRootObjects
) and checks to see if they're alive or not:
foreach( allocatedObject in allNonRootObjects ) {
if( allocatedObject.status == UNKNOWN ) {
deallocate( allocatedObject );
}
}