How to intercept memory accesses/changes in the Hotspot JVM?

I'd like to develop some kind of reverse debugger for Java(where you can step back during execution). To do this, I have to know the initial state of the JVM(which can be easily got by a core dump). Then I have to intercept every memory access the JVM is performing so that I can have a timeline of what the JVM has been doing during execution, so that I can reconstruct every state of the JVM.

So, what I need is a way to intercept the memory accesses but with a low performance overhead, which means that the solution shouldn't add more than 200-300% overhead to the JVM execution, which is already a lot.

Some ideas which come to my mind:
- using ptrace, but it is really slow
- developing some kind of simple virtual machine in which I run the JVM (on top of the guest OS), and this virtual machine intercepts all the memory accesses of the JVM executable, this would be similar to VMware's Replay debugger feature. The problem is that I don't know how to do this or if it is possible at all?

Solution

Effectively, you want to monitor changes of Java objects. Tracking memory changes at levels lower than the JVM is an option. Maximum precision could be achieved using

page write protection and a signal handler for generating write notifications (care must be taken not to interfere with the GC write barrier)
dynamic instrumentation using an instrumentation framework such as Valgrind (static instrumentation is not an option because it does not cover the JIT output)
virtualization based on a custom hypervisor

For snapshotting, you could use

ptrace for process suspension and gaining access to process memory
fork-based asynchronous snapshots using custom code / core dumps (taking advantage of memory copy-on-write, the main process does not have to be suspended)
- maximum precision implementation strategies in a relaxed version

The downside of that option is that you'd also be forced to track writes that are unrelated to the Java heap itself (JVM internals, garbage collection, monitors, libraries, ...). Writes affecting the Java heap represent a subset of all writes taking place in the process at any given time. Also, it'd be less straightforward to extract the actual Java objects from those process snapshots/dumps without actual JVM code.

In terms of monitoring changes at the JVM level, a more favorable strategy, maximum precision could be achieved using

bytecode instrumentation (doesn't cover JNI-based writes)
- high-overhead approach: record every single write
- low-overhead approach: add a write barrier that sets a flag whenever a write occurs and dump flagged objects at regular intervals
a custom OpenJDK build that includes your own monitoring layer
- could take advantage of the garbage collector write barrier to identify changes
  - usually implemented by means of a flag set on every write or
  - a flag that is only set on the first write by write-protecting the memory page associated with an object and handling the segmentation fault with setting a flag

For snapshotting, you could use

custom heap snapshots based on JVMTI's IterateThroughHeap and/or FollowReferences
heap dumps triggered externally using JMX or internally:

HotSpotDiagnosticMXBean mxbean = ManagementFactory.newPlatformMXBeanProxy(
  ManagementFactory.getPlatformMBeanServer(),
  "com.sun.management:type=HotSpotDiagnostic",
  HotSpotDiagnosticMXBean.class);
mxbean.dumpHeap("dump.hprof", true);

maximum precision implementation strategies in a relaxed version

The "right" approach depends on desired performance characteristics, target platform, portability (can it rely on a specific JVM implementation/version), and precision/resolution (snapshots/sampling [aggregating writes] vs. instrumentation [recording each individual write]).

In terms of performance, doing the monitoring at the JVM level tends to be more efficient as only the actual Java heap writes have to be taken into account. Integrating your monitoring solution into the VM and taking advantage of the GC write barrier could be a low-overhead solution, but would also be the least portable one (tied to a specific JVM implementation/version).

If you need to record each individual write, you have to go the instrumentation route and it will most likely turn out to have a significant runtime overhead. You cannot aggregate writes, so there's no optimization potential.

In terms of sampling/snapshotting, implementing a JVMTI agent could be a good compromise. It provides high portability (works with many JVMs) and high flexibility (the iteration and processing can be tailored to your needs, as opposed to relying on standard HPROF heap dumps).