Search code examples
memory-managementjvmgarbage-collectionv8

If memory fragmentation is no longer an issue with 64-bit virtual address space, why does garbage collector in some languages need to compact?


From what I got here:

Memory fragmentation seems no longer an issue in 64-bit virtual address space, so why the garbage collector in some popular languages, V8 js, JVM, etc. still need to compact memory after mark-sweep to prevent heap fragmentation?


Solution

  • (V8 developer here.)

    Address space fragmentation is not the same as VM heap fragmentation.

    On 32-bit systems, it can be surprisingly unlikely to be able to allocate, say, a 256MB object even if 2-3 GB of memory are available in total, because existing objects are spread out all across the address space, so that no contiguous 256MB region can be found any more. That's a problem that 64-bit systems (usually) don't have any more.

    VMs with garbage collectors usually organize their managed heap in "pages". As a mental model, you may assume that each page is 1MB in size. The VM can add pages to its heap (when it needs more) and give them back to the operating system (when they're empty). Now, it can happen that a page was heavily used, then most objects on it died, and now the entire 1MB page is only used for a single object that's just a few bytes in size. When an application went through a phase where it needed lots of memory (and hence many heap pages), and then that operation completed and most objects became unreachable, it can happen that most pages on the heap are mostly empty, only used by few/small objects each. That's a particular form of wasting memory: the VM needs to hold on to many heap pages, but the total size of all live objects is much smaller than the total size of all heap pages (which in turn is [part of] the amount of memory that the process is using from the operating system's point of view).
    That's heap fragmentation. Whether you're running on a 32-bit or 64-bit system has nothing to do with it. And the way to avoid it is to have a "compacting" garbage collector, i.e. to have it move objects together so that some pages become entirely free and can be given back to the operating system.


    Side note: 48 bits of address space (actually 47 bits, one bit is for the kernel) is not as impossible to exhaust as it seems at first. When applications (like virtual machines) have ideas like "oh, we have near-infinite address space, so let's reserve a 4GB 'cage' of address space around this thing, which would allow us to play some interesting performance tricks / create some interesting security guarantees / etc", and then some use case wants thousands of whatever that thing is, then you can run into address space limits before you expect it.