Search code examples
c#.netclrbytecodecil

Stack merge points and managed pointers in the CLR


My understanding is that the .NET CLR is allowed to run "unverifiable" and "verifiable" bytecode. However, in both cases, the bytecode must be "correct CIL" in terms of the ECMA-CIL. Bytecode that is correct but unverifiable could be generated by using unsafe features of C#. Verifiable bytecode could come from day-to-day C#.

Either way, the .NET CLR must guarantee somehow that the bytecode is correct CIL. To do so, it must statically infer basic information about the stack state before each instruction. For instance, the number of elements and very coarse type inference. The inferred information must be merged at the beginning of a basic block if it has more than one predecessor.

My question is, is it allowed to merge managed pointers of different types? I mean this regarding correct CIL but not necessarily verifiable CIL.

.method public static void Bar (int32& a, uint32& b, bool d) cil managed
{
    .maxstack 8
    IL_0003: ldarg.2
    IL_0004: brfalse.s IL_000b

    IL_0006: ldarg.0
    IL_0009: br.s IL_000d

    IL_000b: ldarg.1

    IL_000d: pop
    IL_000e: ret
}

ILVerify reports:

IL]: Error [PathStackUnexpected]: [Test.dll : .Test::Bar(int32&, uint32&, bool)][offset 0x00000006][found address of Int32][expected address of UInt32] Non-compatible types on stack depending on path.

My problem is that I don't know if this is regarding the verifiability or correctness of the bytecode. I mean "verifiability" and "correctness" in the same way they are defined in the ECMA-CIL. I also wonder if I may be misunderstanding the standard.


Solution

  • ECMA-335, page 83, says:

    The type state of the stack (the stack depth and types of each element on the stack) at any given point in a program shall be identical for all possible control flow paths. For example, a program that loops an unknown number of times and pushes a new element on the stack at each iteration would be prohibited.

    This is reinforced by page 85:

    Regardless of the control flow that allows execution to arrive there, each slot on the stack shall have the same data type at any given point within the method body.

    Therefore it would seem that, in general, the types of elements on the stack must be identical. However, section I.12.3.2.1 goes on to specify that the evaluation stack is only recognized to consist of these types: int64, int32, native int, F, &, O, * (either native int or &), or any arbitrary user-defined value type.

    This seems to imply that, while the CIL treats int32& and uint32& as different types, the evaluation stack treats them both as &, thus the "type state" is the same for both control flows. Thus the CIL is correct.

    Verifiability is a stronger criterion, ensuring that the program is ultimately memory-safe and has predictable results in all situations with respect to the program's memory. Verification does not use the limited number of types the CLI uses for the evaluation stack, instead it requires that the type states be compatible, usually by having the same verification type.

    In your case, what matters are the types of the referred-to variables: int32 and uint32. As is described on page 36:

    The verification type of a type T is the following:
    ...
    2. If T is a managed pointer type S& and the reduced type of S is:
    ...
    int32, then its verification type is int32&.

    And again, on the same page:

    The reduced type of a type T is the following:
    1. If the underlying type of T is:
    ...
    int32, or unsigned int32, then its reduced type is int32.

    As a consequence, the reduced types are both int32, and so the verification types of both parameters are int32&, regardless of their signedness, thus the CLI is verifiable.