I have been looking into method inlining with JIT and landed on this article by Scott Hanselman. I took his code a little further, and it seems that while there are only a few call stacks when the code runs in Release mode, at actually does seem to run as if those extra frames still exist within the compiled code (even if they do not report as such).
To start, I have put the code up here if you want to jump in and run it: https://github.com/Mike-EEE/StackOverflow.Performance
I have tried this on .NET 4.7.1, .NET Core 2.0, and even the new .NET Core 2.1 Preview that was recently announced. All feature the same results.
What I have done is created a simple command that emits a message, and then create a subsequent multi-decorated command that wraps this simple command several times over. In the code as published, this decoration is done 10 times, resulting in a nested command with 10 levels (or 11 if you count the origin simple command).
Both of these commands used in test use an empty delegate to emit a message, as using a Console.WriteLine
during a performance test gets rather ugly.
Before I run the tests, I do create a decorated command that uses the same code as the tested code, but instead of an empty delegate, use a Console.WriteLine
to verify the stack trace within the current execution environment.
In Debug, this stack trace looks like the following:
at StackOverflow.Performance.EmitMessage.Emit(String message)
at StackOverflow.Performance.EmitMessage.MethodC(String message)
at StackOverflow.Performance.EmitMessage.MethodB(String message)
at StackOverflow.Performance.EmitMessage.MethodA(String message)
at StackOverflow.Performance.EmitMessage.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.DecoratedCommand.Execute(String message)
at StackOverflow.Performance.Program.Main()
In Release, it looks like this:
at StackOverflow.Performance.EmitMessage.Emit(String message)
at StackOverflow.Performance.Program.Main()
To this point, everything looks awesome and exactly how I would expect. However, I then execute these two commands via BenchmarkDotNet to see what the results are in a performance setting. These results do seem to suggest that the decorated command's call chain is executed in full, even though the emitted stack trace suggests that no such call chain exists:
// * Summary *
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=2.1.300-preview2-008533
[Host] : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
DefaultJob : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------- |----------:|----------:|----------:|
Direct | 3.581 ns | 0.0759 ns | 0.0710 ns |
Decorated | 44.646 ns | 0.7701 ns | 0.7203 ns |
So, it would seem that more than 2 frames are being executed here, and that has led me to posting this question here on StackOverflow. I do have several questions around this:
For the sake of completeness, here is all the code to run this sample:
public class Program
{
static void Main()
{
// Writes out the stack trace from a decorated command:
Decorate.Get(new EmitMessage(Console.WriteLine))
.Execute(null);
BenchmarkRunner.Run<Program>();
Console.ReadKey();
}
readonly ICommand _direct, _decorated;
readonly string _message;
public Program() : this(new EmitMessage()) {}
public Program(ICommand direct) : this(direct, Decorate.Get(direct), "Hello World!") {}
public Program(ICommand direct, ICommand decorated, string message)
{
_direct = direct;
_decorated = decorated;
_message = message;
}
[Benchmark]
public void Direct()
{
_direct.Execute(_message);
}
[Benchmark]
public void Decorated()
{
_decorated.Execute(_message);
}
}
static class Decorate
{
public static ICommand Get(ICommand parameter)
=> Enumerable.Range(0, 10)
.Aggregate(parameter, (command, _) => new DecoratedCommand(command));
}
sealed class DecoratedCommand : ICommand
{
readonly ICommand _command;
public DecoratedCommand(ICommand command) => _command = command;
public void Execute(string message)
{
_command.Execute(message);
}
}
sealed class EmitMessage : ICommand
{
readonly Action<string> _emit;
public EmitMessage() : this(_ => {}) {}
public EmitMessage(Action<string> emit) => _emit = emit;
public void Execute(string message)
{
MethodA(message);
}
void MethodA(string message)
{
MethodB(message);
}
void MethodB(string message)
{
MethodC(message);
}
void MethodC(string message)
{
Emit(message);
}
void Emit(string message)
{
_emit(message ?? new StackTrace().ToString());
}
}
public interface ICommand
{
void Execute(string message);
}
Thank you in advance for any insight/assistance that you can provide!
shamelessly copying Stephen Toub's work from here
I just looked at the disassembly for DecoratedCommandUsing a checked build of coreclr and running with
setCOMPlus_JitDisasm=Execute
see documentation. Indeed it's using tail calls:; Assembly listing for method
DecoratedCommand:Execute(ref)
:this; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
; V00 this [V00,T00] ( 3, 3 ) ref -> rcx this class-hnd
; V01 arg1 [V01,T01] ( 3, 3 ) ref -> rdx class-hnd
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00]
; Lcl frame size = 0
G_M223_IG01:
G_M223_IG02:
488B4908 mov rcx, gword ptr [rcx+8]
49BB48007733FD7F0000 mov r11, 0x7FFD33770048
488B05934FE5FF mov rax, qword ptr [(reloc)]
3909 cmp dword ptr [rcx], ecx
G_M223_IG03:
48FFE0 rex.jmp rax