How JVM collect ThreadDump underhood

Please explain how JVM collect ThreadDump underhood.
I don't understand how it collectons stack traces of threads that are off-CPU (wait disk IO,Network,non-voluntary context switches).
For example, linux perf collect info only about on-CPU threads(which use CPU-cycles)

Solution

I'll take HotSpot JVM as an example.

The JVM maintains the list of all Java threads: for each thread it has a corresponding VM structure. A thread can be in one of the following states depending on its execution context (HotSpot knows the current state of each thread, because it's responsible for switching states):

in_Java - a thread is executing Java code, either in the interpreter or in a JIT-compiled method;
in_vm - a thread is inside a VM runtime function;
in_native - a thread is running a native method in JNI context;
there are also transitional states, but let's skip them for simplicity.

An off-cpu thread can have only

in_native state: all socket I/O, disk I/O, and otherwise blocking operations are performed only in native code;
in_vm state, when a thread is blocked on a VM mutex.

Whenever the JVM calls a native method or acquires a contended mutex, it stores the last Java frame pointer into the Thread structure.

Now the crucial part: HotSpot JVM obtains a thread dump only at a safepoint.

When you ask for a thread dump, the JVM requests a stop-the-world pause. All threads in in_Java state are stopped at the nearest safepoint, where the JVM knows how to walk the stack.

Threads in in_native state are not stopped, but they don't need to. HotSpot knows their last Java frame, because the pointer is stored in a Thread structure. Knowing the top Java frame, the JVM can find its caller, then the caller of the caller, and so on.

What important here is that the Java part of the stack is "frozen", no matter what the native method does. The top part of the stack (native) can change back and forth, while the bottom part (Java) remains immutable. It cannot change, since the JVM checks for a pending safepoint operation on every switch from in_native to in_Java: if a native method returns, and the VM is currently running a stop-the-world operation, current thread blocks until the operation ends.

So, getting a thread dump involves

Stopping all in_Java and in_vm threads at a safepoint;
Walking through the global list of threads maintained by the JVM;
If a thread is running native method, its top Java frame is stored in a thread structure; if a thread is running Java code, its top frame corresponds to the currently executing Java method.
Each frame has a link to the previous frame, so given the top frame, the JVM can construct the whole stack trace to the bottom.