I am profiling a heavily modified fork of JS Interpreter, running on node.js v12.12.0, using --prof
, while it runs a short synthetic benchmark. In the output from `node --prof-process, I see that 63% of the program's runtime is spent in C++:
[Summary]:
ticks total nonlib name
1503 35.6% 35.9% JavaScript
2658 63.0% 63.4% C++
119 2.8% 2.8% GC
30 0.7% Shared libraries
29 0.7% Unaccounted
In particular, there is one single C++ function that accounts for 59% of the entire runtime:
[C++]:
ticks total nonlib name
2504 59.3% 59.8% t __ZN4node9inspector8protocol11NodeRuntime14DispatcherImplD1Ev
27 0.6% 0.6% T node::native_module::NativeModuleEnv::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&)
23 0.5% 0.5% T _proc_set_dirty
16 0.4% 0.4% T __kernelrpc_vm_remap
9 0.2% 0.2% t __malloc_initialize
9 0.2% 0.2% T _thread_get_state
8 0.2% 0.2% T node::contextify::ContextifyContext::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&)
...
Unfortunately, it's not too obvious to me what __ZN4node9inspector8protocol11NodeRuntime14DispatcherImplD1Ev
might be doing.
The name suggests that it has something to do with the inspector protocol, but I am not using the --inspect
command line flag nor attempting to attach to the running process using the inspector.
I can't seem to find any file in node.js sources that contains both the strings "NodeRuntime" and "DispatcherImpl".
It seems to be called from several different places, most usually recursively from itself (list trimmed to show mainly top-level entries), but the named JavaScript functions don't have any obvious commonality:
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 1.0% are not shown.
ticks parent name
2504 59.3% t __ZN4node9inspector8protocol11NodeRuntime14DispatcherImplD1Ev
1287 51.4% t __ZN4node9inspector8protocol11NodeRuntime14DispatcherImplD1Ev
785 61.0% LazyCompile: *intrp.Object.defineProperty /Users/cpcallen/src/CodeCity/server/interpreter.js:4477:51
226 17.6% LazyCompile: *intrp.UserFunction.instantiateDeclarations /Users/cpcallen/src/CodeCity/server/interpreter.js:4840:66
67 5.2% LazyCompile: *intrp.Object.getOwnPropertyDescriptor /Users/cpcallen/src/CodeCity/server/interpreter.js:4455:61
34 2.6% t __ZN4node9inspector8protocol11NodeRuntime14DispatcherImplD1Ev
22 1.7% LazyCompile: *stepFuncs_.MemberExpression /Users/cpcallen/src/CodeCity/server/interpreter.js:6582:42
278 11.1% LazyCompile: *Interpreter.run /Users/cpcallen/src/CodeCity/server/interpreter.js:290:37
140 5.6% LazyCompile: *stepFuncs_.Identifier /Users/cpcallen/src/CodeCity/server/interpreter.js:6494:36
121 4.8% LazyCompile: *intrp.UserFunction.instantiateDeclarations /Users/cpcallen/src/CodeCity/server/interpreter.js:4840:66
91 3.6% LazyCompile: ~runBench /Users/cpcallen/src/CodeCity/server/tests/interpreter_bench.js:37:18
79 3.2% LazyCompile: *intrp.UserFunction.call /Users/cpcallen/src/CodeCity/server/interpreter.js:4782:47
65 2.6% LazyCompile: *Interpreter.getBoundNames_ /Users/cpcallen/src/CodeCity/server/interpreter.js:2907:48
62 2.5% LazyCompile: *stepFuncs_.CallExpression /Users/cpcallen/src/CodeCity/server/interpreter.js:6039:40
I wondered if it might actually be the garbage collector, but using --trace-gc
shows that GC is taking much less than 10% of total runtime.
How can I figure out what this C++ function is doing?
Finding DispatcherImpl
isn't hard: https://github.com/nodejs/node/search?q=dispatcherimpl&unscoped_q=dispatcherimpl leads straight to https://github.com/nodejs/node/blob/5aaa7fee2e4a075d9123b885f9e8cda3de2a780a/tools/inspector_protocol/templates/TypeBuilder_cpp.template#L218. But that probably isn't what you're really looking for...
There was a bug in the --prof
system for a while where C++ ticks would get attributed to the wrong function -- looks like you might be running into that. It's been fixed in V8 recently, but the fix hasn't rolled into a Node release yet.
As a workaround, on Linux, you can use perf
to profile C++ code [1], while still using --prof
for JavaScript; JavaScript ticks as well as the C++/JavaScript distribution should be reliably correct in --prof
's output. On other platforms there should be equivalent generic profiling techniques.
[1] For details, read the man page. I typically use something like:
perf record -e cycles -F 10000 <executable and arguments>
perf report -M intel