Search code examples
memorycompiler-constructionv8jit

If v8 uses the "code" or "text" memory type, or if everything is in the heap/stack


In a typical memory layout there are 4 items:

  1. code/text (where the compiled code of the program itself resides)
  2. data
  3. stack
  4. heap

I am new to memory layouts so I am wondering if v8, which is a JIT compiler and dynamically generates code, stores this code in the "code" segment of the memory, or just stores it in the heap along with everything else. I'm not sure if the operating system gives you access to the code/text so not sure if this is a dumb question.

enter image description here


Solution

  • The below is true for the major operating systems running on the major CPUs in common use today. Things will differ on old or some embedded operating systems (in particular things are a lot simpler on operating systems without virtual memory) or when running code without an OS or on CPUs with no support for memory protection.

    The picture in your question is a bit of a simplification. One thing it does not show is that (virtual) memory is made up of pages provided to you by the operating system. Each page has its own permissions controlling whether your process can read, write and/or execute the data in that page.

    The text section of a binary will be loaded onto pages that are executable, but not writable. The read-only data section will be loaded onto pages that are neither writable nor executable. All other memory in your picture ((un)initialized data, heap, stack) will be stored on pages that are writable, but not executable.

    These permissions prevent security flaws (such as buffer overruns) that could otherwise allow attackers to execute arbitrary code by making the program jump into code provided by the attacker or letting the attacker overwrite code in the text section.

    Now the problem with these permissions, with regards to JIT compilation, is that you can't execute your JIT-compiled code: if you store it on the stack or the heap (or within a global variable), it won't be on an executable page, so the program will crash when you try to jump into the code. If you try to store it in the text area (by making use of left-over memory on the last page or by overwriting parts of the JIT-compilers code), the program will crash because you're trying to write to read-only memory.

    But thankfully operating systems allow you to change the permissions of a page (on POSIX-systems this can be done using mprotect and on Windows using VirtualProtect). So your first idea might be to store the generated code on the heap and then simply make the containing pages executable. However this can be somewhat problematic: VirtualProtect and some implementations of mprotect require a pointer to the beginning of a page, but your array does not necessarily start at the beginning of a page if you allocated it using malloc (or new or your language's equivalent). Further your array may share a page with other data, which you don't want to be executable.

    To prevent these issues, you can use functions, such as mmap on Unix-like operating systems and VirtualAlloc on Windows, that give you pages of memory "to yourself". These functions will allocate enough pages to contain as much memory as you requested and return a pointer to the beginning of that memory (which will be at the beginning of the first page). These pages will not be available to malloc. That is, even if you array is significantly smaller than the size of a page on your OS, the page will only be used to store your array - a subsequent call to malloc will not return a pointer to memory in that page.

    So the way that most JIT-compilers work is that they allocate read-write memory using mmap or VirtualAlloc, copy the generated machine instructions into that memory, use mprotect or VirtualProtect to make the memory executable and non-writable (for security reasons you never want memory to be executable and writable at the same time if you can avoid it) and then jump into it. In terms of its (virtual) address, the memory will be part of the heap's area of the memory, but it will be separate from the heap in the sense that it won't be managed by malloc and free.